UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,591
0.26 stars / hour

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

naver-ai/tablevqabench 30 Apr 2024

QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table.

Language Modelling Large Language Model +2

16
0.25 stars / hour

TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

decisionintelligence/tfb 29 Mar 2024

Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8, 068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets.

Benchmarking Multivariate Time Series Forecasting +2

162
0.25 stars / hour

MMBench: Is Your Multi-modal Model an All-around Player?

InternLM/opencompass 12 Jul 2023

In response to these challenges, we propose MMBench, a novel multi-modality benchmark.

Visual Question Answering

2,675
0.25 stars / hour

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Image Generation Language Modelling +2

3,485
0.24 stars / hour

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

modelscope/swift 18 Dec 2023

Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.

Decoder Text-to-Image Generation

1,391
0.24 stars / hour

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context

microsoft/monitors4codegen NeurIPS 2023

We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.

131
0.24 stars / hour

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

hlstransform/submission 29 Apr 2024

Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs).

Edge-computing

25
0.23 stars / hour

State-specific protein-ligand complex structure prediction with a multi-scale deep generative model

zrqiao/NeuralPLexer 30 Sep 2022

The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life.

Benchmarking Blind Docking +3

180
0.23 stars / hour

WorldGPT: Empowering LLM as Multimodal World Model

dcdmllm/worldgpt 28 Apr 2024

As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.

Language Modelling Large Language Model

98
0.23 stars / hour