Simple and Effective Masked Diffusion Language Models

kuleshov-group/mdlm 11 Jun 2024

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling.

Language Modelling Masked Language Modeling

Fast Timing-Conditioned Latent Audio Diffusion

stability-ai/stable-audio-tools 7 Feb 2024

Generating long-form 44. 1kHz stereo audio from text prompts can be computationally demanding.

Audio Generation

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

woooodyy/agentgym 6 Jun 2024

Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community.

Language Modelling Large Language Model

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

WUHU-G/RCC_Transformer 10 Jun 2024

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity.

Long-Context Understanding Question Answering +2

Image and Video Tokenization with Binary Spherical Quantization

zhaoyue-zephyrus/bsq-vit 11 Jun 2024

The resulting BSQ-ViT achieves state-of-the-art visual reconstruction quality on image and video reconstruction benchmarks with 2. 4$\times$ throughput compared to the best prior methods.

Decoder Image Generation +3

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

spcl/mrag 7 Jun 2024

Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses.

Benchmarking Decoder +1

CodeR: Issue Resolving with Multi-Agent and Task Graphs

nl2code/coder 3 Jun 2024

GitHub issue resolving recently has attracted significant attention from academia and industry.

Bug fixing

Vision-LSTM: xLSTM as Generic Vision Backbone

NX-AI/vision-lstm 6 Jun 2024

Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing.

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

opengvlab/lcl 11 Jun 2024

Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data.

Contrastive Learning

Blind Image Restoration via Fast Diffusion Inversion

hamadichihaoui/BIRD 29 May 2024

This is ultimately equivalent to casting the IR task as an optimization problem in the space of the input noise.

Image Restoration

