OMG-Seg: Is One Model Good Enough For All Segmentation?

lxtgh/omg-seg 18 Jan 2024

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.

Decoder Interactive Segmentation +4

776
0.27 stars / hour

Demystify Mamba in Vision: A Linear Attention Perspective

LeapLabTHU/MLLA 26 May 2024

By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success.

Image Classification

69
0.27 stars / hour

Efficient Guided Generation for Large Language Models

normal-computing/outlines 19 Jul 2023

In this article we show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine.

Language Modelling Text Generation

6,333
0.26 stars / hour

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

alibaba-damo-academy/FunASR 28 Nov 2021

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

4,027
0.20 stars / hour

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

4,027
0.26 stars / hour

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

deepseek-ai/deepseek-v2 7 May 2024

MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.

Language Modelling Reinforcement Learning (RL)

2,385
0.26 stars / hour

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

ldzhangyx/instruct-MusicGen 28 May 2024

Recent advances in text-to-music editing, which employ text queries to modify music (e. g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation.

31
0.25 stars / hour

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

hustvl/vig 28 May 2024

Recently, linear complexity sequence modeling networks have achieved modeling capabilities similar to Vision Transformers on a variety of computer vision tasks, while using fewer FLOPs and less memory.

Representation Learning

52
0.25 stars / hour

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

epfml/schedules-and-scaling 28 May 2024

Scale has become a main ingredient in obtaining strong machine learning models.

24
0.24 stars / hour

Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Virtual Try-on

2,680
0.24 stars / hour