Trending Research

MambaOut: Do We Really Need Mamba for Vision?

yuweihao/mambaout • • 13 May 2024

For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks.

Image Classification Instance Segmentation +2

1,226

13.30 stars / hour

Paper
Code

A decoder-only foundation model for time-series forecasting

google-research/timesfm • • 14 Oct 2023

Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset.

Decoder Time Series +1

1,888

3.94 stars / hour

Paper
Code

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

x-lance/anitalker • • 6 May 2024

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait.

Metric Learning Self-Supervised Learning

729

3.11 stars / hour

Paper
Code

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope • 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

Multi-agent Integration

2,222

2.91 stars / hour

Paper
Code

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

alpha-vllm/lumina-t2x • • 9 May 2024

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

979

2.90 stars / hour

Paper
Code

Autonomous LLM-driven research from data to human-verifiable research papers

technion-kishony-lab/data-to-paper • 24 Apr 2024

As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability.

253

1.81 stars / hour

Paper
Code

Sakuga-42M Dataset: Scaling Up Cartoon Research

zhenglinpan/SakugaDataset • 13 May 2024

Can we harness the success of the scaling paradigm to benefit cartoon research?

Ranked #1 on Video to Text Retrieval on Sakuga-42M

Text to Video Retrieval Video to Text Retrieval

107

1.63 stars / hour

Paper
Code

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

deepseek-ai/deepseek-v2 • • 7 May 2024

MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.

Language Modelling Reinforcement Learning (RL)

2,040

1.46 stars / hour

Paper
Code

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

hvision-nku/storydiffusion • • 2 May 2024

This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation.

motion prediction Story Generation +1

4,527

1.43 stars / hour

Paper
Code

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

OS-Copilot/FRIDAY • 12 Feb 2024

Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents.

1,290

1.43 stars / hour

Paper
Code