We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.
QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table.
Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8, 068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets.
In response to these challenges, we propose MMBench, a novel multi-modality benchmark.
Ranked #1 on Visual Question Answering on MMBench
We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".
Ranked #7 on Image Generation on ImageNet 256x256
Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.
We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.
Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs).
The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life.
As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.