OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

lingyvkong/onechart 15 Apr 2024

To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information.

62
0.43 stars / hour

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,030
0.42 stars / hour

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

airi-institute/hairfastgan 1 Apr 2024

Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on.

234
0.41 stars / hour

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

microsoft/mechanistic-error-probe 26 Sep 2023

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

27
0.41 stars / hour

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

opengvlab/internvl 21 Dec 2023

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

 Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

794
0.40 stars / hour

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

modelscope/swift 18 Dec 2023

Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.

Text-to-Image Generation

1,162
0.38 stars / hour

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

showlab/show-1 27 Sep 2023

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Text-to-Video Generation Video Alignment +1

903
0.38 stars / hour

LLoCO: Learning Long Contexts Offline

jeffreysijuntan/lloco 11 Apr 2024

We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.

4k In-Context Learning +1

76
0.38 stars / hour

Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

siyan-zhao/prepacking 15 Apr 2024

In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt lengths, significant computation is wasted by the standard practice of padding sequences to the maximum length.

39
0.36 stars / hour

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning

279
0.35 stars / hour