Trending Research

Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

siyan-zhao/prepacking • • 15 Apr 2024

In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt lengths, significant computation is wasted by the standard practice of padding sequences to the maximum length.

0.54 stars / hour

Paper
Code

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

bytedance/schurvins • 4 Dec 2023

To this end, we propose a novel filter-based VINS framework named SchurVINS, which could guarantee both high accuracy by building a complete residual model and low computational complexity with Schur complement.

Computational Efficiency

233

0.51 stars / hour

Paper
Code

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec • • 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning

230

0.50 stars / hour

Paper
Code

Rho-1: Not All Tokens Are What You Need

microsoft/rho • 11 Apr 2024

After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.

Continual Pretraining Language Modelling +1

198

0.47 stars / hour

Paper
Code

LLoCO: Learning Long Contexts Offline

jeffreysijuntan/lloco • • 11 Apr 2024

We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.

4k In-Context Learning +1

0.46 stars / hour

Paper
Code

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

openbmb/minicpm • • 9 Apr 2024

For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.

Domain Adaptation

3,690

0.44 stars / hour

Paper
Code

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Tianhao-Qi/DEADiff_code • • 11 Mar 2024

The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics.

Disentanglement

137

0.42 stars / hour

Paper
Code

A Survey of Large Language Models

rucaibox/llmsurvey • • 31 Mar 2023

To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.

Language Modelling

8,661

0.41 stars / hour

Paper
Code

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

airi-institute/hairfastgan • • 1 Apr 2024

Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on.

210

0.40 stars / hour

Paper
Code

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO • 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

3,869

0.40 stars / hour

Paper
Code