Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

xuezhemax/megalodon 12 Apr 2024

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

179
1.60 stars / hour

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

alibabaresearch/advancedliteratemachinery 8 Apr 2024

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

document understanding

887
1.25 stars / hour

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Leeroo-AI/mergoo 12 Mar 2024

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

Arithmetic Reasoning Code Generation +6

176
0.87 stars / hour

Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

siyan-zhao/prepacking 15 Apr 2024

In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt lengths, significant computation is wasted by the standard practice of padding sequences to the maximum length.

31
0.86 stars / hour

Arc2Face: A Foundation Model of Human Faces

Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition 18 Mar 2024

This paper presents Arc2Face, an identity-conditioned face foundation model, which, given the ArcFace embedding of a person, can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.

Diffusion Personalization Tuning Free Face Generation +1

106
0.84 stars / hour

Rho-1: Not All Tokens Are What You Need

microsoft/rho 11 Apr 2024

After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.

Continual Pretraining Language Modelling +1

198
0.82 stars / hour

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

fudan-generative-vision/champ 21 Mar 2024

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

2,746
0.82 stars / hour

LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

Recognito-Vision/Face-SDK-Linux-Demos 13 Mar 2024

This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition.

Face Recognition Self-Supervised Learning

99
0.81 stars / hour

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Beomi/InfiniTransformer 10 Apr 2024

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation.

Book summarization Language Modelling +1

108
0.79 stars / hour

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

768
0.76 stars / hour