Trending Research

Matching Anything by Segmenting Anything

siyuanliii/masa • 6 Jun 2024

The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT).

Domain Generalization Multiple Object Tracking +2

260

2.38 stars / hour

Paper
Code

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/omnilmm • • 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

6,849

2.13 stars / hour

Paper
Code

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

yangling0818/buffer-of-thought-llm • 6 Jun 2024

We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs).

Arithmetic Reasoning Code Generation +2

167

2.05 stars / hour

Paper
Code

Vision-LSTM: xLSTM as Generic Vision Backbone

NX-AI/vision-lstm • • 6 Jun 2024

Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing.

203

1.89 stars / hour

Paper
Code

Fast Timing-Conditioned Latent Audio Diffusion

stability-ai/stable-audio-tools • • 7 Feb 2024

Generating long-form 44. 1kHz stereo audio from text prompts can be computationally demanding.

Audio Generation

1,862

1.63 stars / hour

Paper
Code

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

ictnlp/streamspeech • • 5 Jun 2024

Simultaneous speech-to-speech translation (Simul-S2ST, a. k. a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication.

Ranked #1 on de-en on CVSS

Automatic Speech Recognition (ASR) de-en +11

222

1.57 stars / hour

Paper
Code

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

potamides/detikzify • • 24 May 2024

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy.

Language Modelling

379

1.56 stars / hour

Paper
Code

Scalable MatMul-free Language Modeling

ridgerchu/matmulfreellm • • 4 Jun 2024

Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2. 7B parameters.

Language Modelling

733

1.54 stars / hour

Paper
Code

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

woooodyy/agentgym • 6 Jun 2024

Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community.

Language Modelling Large Language Model

1.26 stars / hour

Paper
Code

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

gojasper/flash-diffusion • • 4 Jun 2024

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion.

Face Swapping Image Inpainting +1

144

1.24 stars / hour

Paper
Code