Trending Research

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO • 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,633

0.26 stars / hour

Paper
Code

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

naver-ai/tablevqabench • 30 Apr 2024

QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table.

Language Modelling Large Language Model +2

0.25 stars / hour

Paper
Code

TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

decisionintelligence/tfb • • 29 Mar 2024

Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8, 068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets.

Benchmarking Multivariate Time Series Forecasting +2

163

0.25 stars / hour

Paper
Code

MMBench: Is Your Multi-modal Model an All-around Player?

InternLM/opencompass • • 12 Jul 2023

In response to these challenges, we propose MMBench, a novel multi-modality benchmark.

Ranked #1 on Visual Question Answering on MMBench

Visual Question Answering

2,689

0.25 stars / hour

Paper
Code

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR • • 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Ranked #7 on Image Generation on ImageNet 256x256

Image Generation Language Modelling +2

3,497

0.24 stars / hour

Paper
Code

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

modelscope/swift • • 18 Dec 2023

Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.

Decoder Text-to-Image Generation

1,410

0.24 stars / hour

Paper
Code

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context

microsoft/monitors4codegen • NeurIPS 2023

We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.

131

0.24 stars / hour

Paper
Code

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

hlstransform/submission • • 29 Apr 2024

Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs).

Edge-computing

0.23 stars / hour

Paper
Code