Matching Anything by Segmenting Anything

siyuanliii/masa 6 Jun 2024

The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT).

Domain Generalization Multiple Object Tracking +2

260
2.38 stars / hour

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/omnilmm 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

6,849
2.13 stars / hour

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

yangling0818/buffer-of-thought-llm 6 Jun 2024

We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs).

Arithmetic Reasoning Code Generation +2

167
2.05 stars / hour

Vision-LSTM: xLSTM as Generic Vision Backbone

NX-AI/vision-lstm 6 Jun 2024

Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing.

203
1.89 stars / hour

Fast Timing-Conditioned Latent Audio Diffusion

stability-ai/stable-audio-tools 7 Feb 2024

Generating long-form 44. 1kHz stereo audio from text prompts can be computationally demanding.

Audio Generation

1,862
1.63 stars / hour

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

ictnlp/streamspeech 5 Jun 2024

Simultaneous speech-to-speech translation (Simul-S2ST, a. k. a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication.

 Ranked #1 on de-en on CVSS

Automatic Speech Recognition (ASR) de-en +11

222
1.57 stars / hour

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

potamides/detikzify 24 May 2024

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy.

Language Modelling

379
1.56 stars / hour

Scalable MatMul-free Language Modeling

ridgerchu/matmulfreellm 4 Jun 2024

Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2. 7B parameters.

Language Modelling

733
1.54 stars / hour

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

woooodyy/agentgym 6 Jun 2024

Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community.

Language Modelling Large Language Model

92
1.26 stars / hour

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

gojasper/flash-diffusion 4 Jun 2024

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion.

Face Swapping Image Inpainting +1

144
1.24 stars / hour