Streaming Deep Reinforcement Learning Finally Works

mohmdelsayed/streaming-drl 18 Oct 2024

This paper introduces the stream-x algorithms, the first class of deep RL algorithms to overcome stream barrier for both prediction and control and match sample efficiency of batch RL.

Atari Games Deep Reinforcement Learning +3

133
0.70 stars / hour

MARS: Unleashing the Power of Variance Reduction for Training Large Models

AGI-Arena/MARS 15 Nov 2024

Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models.

Stochastic Optimization

266
0.68 stars / hour

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

modelscope/ClearerVoice-Studio 23 Feb 2023

To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.

 Ranked #1 on Speech Separation on WSJ0-2mix-16k (using extra training data)

Speech Separation

72
0.60 stars / hour

Multi-Programming Language Sandbox for LLMs

Ablustrund/MPLSandbox 30 Oct 2024

We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs).

208
0.60 stars / hour

LightRAG: Simple and Fast Retrieval-Augmented Generation

hkuds/lightrag 8 Oct 2024

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.

Information Retrieval RAG +1

10,202
0.59 stars / hour

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

PKU-YuanGroup/LLaVA-CoT 15 Nov 2024

Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1.

Logical Reasoning Multimodal Reasoning +2

1,472
0.58 stars / hour

JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation

jdh-algo/JoyVASA 14 Nov 2024

Specifically, in the first stage, we introduce a decoupled facial representation framework that separates dynamic facial expressions from static 3D facial representations.

Image Animation Motion Generation +1

468
0.56 stars / hour

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

alibaba/Tora 31 Jul 2024

The TE encodes arbitrary trajectories into hierarchical spacetime motion patches with a 3D video compression network.

Video Compression Video Generation

1,015
0.54 stars / hour

Multimodal Autoregressive Pre-training of Large Vision Encoders

apple/ml-aim 21 Nov 2024

We introduce a novel method for pre-training of large-scale vision encoders.

Decoder Image Classification

1,031
0.53 stars / hour

MagicPIG: LSH Sampling for Efficient LLM Generation

infini-ai-lab/magicpig 21 Oct 2024

MagicPIG stores the LSH hash tables and runs the attention computation on the CPU, which allows it to serve longer contexts and larger batch sizes with high approximation accuracy.

122
0.53 stars / hour