Trending Research

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

xuezhemax/megalodon • • 12 Apr 2024

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

179

1.60 stars / hour

Paper
Code

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

alibabaresearch/advancedliteratemachinery • • 8 Apr 2024

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

document understanding

894

1.25 stars / hour

Paper
Code

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Leeroo-AI/mergoo • • 12 Mar 2024

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

Ranked #30 on Question Answering on TriviaQA

Arithmetic Reasoning Code Generation +6

176

0.87 stars / hour

Paper
Code

Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

siyan-zhao/prepacking • • 15 Apr 2024

In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt lengths, significant computation is wasted by the standard practice of padding sequences to the maximum length.

0.86 stars / hour

Paper
Code

Arc2Face: A Foundation Model of Human Faces

Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition • 18 Mar 2024

This paper presents Arc2Face, an identity-conditioned face foundation model, which, given the ArcFace embedding of a person, can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.

Ranked #1 on Diffusion Personalization Tuning Free on AgeDB

Diffusion Personalization Tuning Free Face Generation +1

106

0.84 stars / hour

Paper
Code

Rho-1: Not All Tokens Are What You Need

microsoft/rho • 11 Apr 2024

After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.

Continual Pretraining Language Modelling +1

198

0.82 stars / hour

Paper
Code

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

fudan-generative-vision/champ • • 21 Mar 2024

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

2,746

0.82 stars / hour

Paper
Code

LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

Recognito-Vision/Face-SDK-Linux-Demos • 13 Mar 2024

This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition.

Face Recognition Self-Supervised Learning

0.81 stars / hour

Paper
Code

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Beomi/InfiniTransformer • • 10 Apr 2024

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation.

Book summarization Language Modelling +1

108

0.79 stars / hour

Paper
Code

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v • • 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

768

0.76 stars / hour

Paper
Code