Trending Research

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Infini-AI-Lab/TriForce • • 18 Apr 2024

However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length.

0.50 stars / hour

Paper
Code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

opengvlab/internvl • • 21 Dec 2023

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

844

0.50 stars / hour

Paper
Code

Rethinking Inductive Biases for Surface Normal Estimation

baegwangbin/dsine • • 1 Mar 2024

Despite the growing demand for accurate surface normal estimation models, existing methods use general-purpose dense prediction models, adopting the same inductive biases as other tasks.

Surface Normal Estimation

476

0.48 stars / hour

Paper
Code

Efficient Multimodal Learning from Data-centric Perspective

baai-dcai/bunny • • 18 Feb 2024

Multimodal Large Language Models (MLLMs) have demonstrated notable capabilities in general visual understanding and reasoning tasks.

550

0.47 stars / hour

Paper
Code

LongEmbed: Extending Embedding Models for Long Context Retrieval

dwzhu-pku/longembed • • 18 Apr 2024

This paper explores context window extension of existing embedding models, pushing the limit to 32k without requiring additional training.

4k 8k +3

0.47 stars / hour

Paper
Code

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

xuezhemax/megalodon • • 12 Apr 2024

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

325

0.46 stars / hour

Paper
Code

BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models

ledzy/badam • • 3 Apr 2024

This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver.

105

0.45 stars / hour

Paper
Code

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Beomi/InfiniTransformer • • 10 Apr 2024

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation.

Book summarization Language Modelling +1

193

0.43 stars / hour

Paper
Code

State Space Model for New-Generation Network Alternative to Transformers: A Survey

event-ahu/mamba_state_space_model_paper_list • • 15 Apr 2024

In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM.

327

0.42 stars / hour

Paper
Code

LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

yangjianxin1/firefly • • 8 Nov 2023

We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources.

4,547

0.41 stars / hour

Paper
Code