TransMLA: Multi-Head Latent Attention Is All You Need

fxmeng/transmla 11 Feb 2025

In this paper, we show that GQA can always be represented by MLA while maintaining the same KV cache overhead, but the converse does not hold.

106
0.48 stars / hour

On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

eai-lab/on-device-sora 5 Feb 2025

We present On-device Sora, a first pioneering solution for diffusion-based on-device text-to-video generation that operates efficiently on smartphone-grade devices.

Denoising Text-to-Video Generation +1

96
0.48 stars / hour

Drone Data Analytics for Measuring Traffic Metrics at Intersections in High-Density Areas

qpu523/high-density-intersection-dataset 4 Nov 2024

This study employed over 100 hours of high-altitude drone video data from eight intersections in Hohhot to generate a unique and extensive dataset encompassing high-density urban road intersections in China.

69
0.48 stars / hour

LLM4Decompile: Decompiling Binary Code with Large Language Models

albertan017/LLM4Decompile 8 Mar 2024

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute.

HumanEval

5,021
0.47 stars / hour

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

internlm/oreal 10 Feb 2025

To alleviate the long-existing difficulties brought by sparse rewards in RL, which are even exacerbated by the partial correctness of the long chain of thought for reasoning tasks, we further apply a token-level reward model to sample important tokens in reasoning trajectories for learning.

Math Mathematical Reasoning +1

80
0.46 stars / hour

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

fireredteam/fireredasr 24 Jan 2025

We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

536
0.46 stars / hour

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

antgroup/echomimic_v2 15 Nov 2024

Recent work on human animation usually involves audio, pose, or movement maps conditions, thereby achieves vivid animation quality.

Audio-Driven Body Animation Human Animation +1

2,772
0.44 stars / hour

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

qwenlm/qwen2-vl 1 Jun 2023

We propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach for LLM low-bit weight-only quantization.

Autonomous Driving Cloud Computing +5

7,470
0.42 stars / hour

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

deepseek-ai/deepseek-vl2 13 Dec 2024

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades.

Chart Understanding Optical Character Recognition +4

3,685
0.42 stars / hour

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

hjyao00/mulberry 24 Dec 2024

Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question.

593
0.39 stars / hour