SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models

xid32/soundmind 15 Jun 2025

While large language models have shown reasoning capabilities, their application to the audio modality, particularly in large audio-language models (ALMs), remains significantly underdeveloped.

Logical Reasoning Reinforcement Learning (RL)

706
0.39 stars / hour

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection

zcablii/sm3det 30 Dec 2024

To address these, we establish a benchmark dataset and propose a unified model, SM3Det (Single Model for Multi-Modal datasets and Multi-Task object Detection).

object-detection Object Detection

211
0.37 stars / hour

TaskCraft: Automated Generation of Agentic Tasks

oppo-personalai/taskcraft 11 Jun 2025

Agentic tasks, which require multi-step problem solving with autonomy, tool use, and adaptive reasoning, are becoming increasingly central to the advancement of NLP and AI.

112
0.37 stars / hour

Craftium: An Extensible Framework for Creating Reinforcement Learning Environments

mikelma/craftium 4 Jul 2024

Most Reinforcement Learning (RL) environments are created by adapting existing physics simulators or video games.

Benchmarking Minecraft +3

109
0.34 stars / hour

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

luo-junyu/awesome-agent-papers 27 Mar 2025

The era of intelligent agents is upon us, driven by revolutionary advancements in large language models.

Language Modeling Language Modelling +1

1,139
0.34 stars / hour

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing

yueming6568/deltaedit 12 Oct 2023

Based on DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model during the training phase, and predicts the latent space directions from the CLIP textual feature differences during the inference phase.

text-guided-image-editing

205
0.34 stars / hour

Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation

qiukunpeng/siamese-diffusion CVPR 2025

Deep learning has revolutionized medical image segmentation, yet its full potential remains constrained by the paucity of annotated datasets.

Image Generation Image Segmentation +2

50
0.33 stars / hour

Deep Industrial Image Anomaly Detection: A Survey

m-3lab/awesome-industrial-anomaly-detection 27 Jan 2023

In this paper, we provide a comprehensive review of deep learning-based image anomaly detection techniques, from the perspectives of neural network architectures, levels of supervision, loss functions, metrics and datasets.

Anomaly Detection Deep Learning +1

2,466
0.32 stars / hour

FLUX that Plays Music

black-forest-labs/flux 1 Sep 2024

This paper explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation, termed as FluxMusic.

Music Generation Text-to-Music Generation

23,431
0.32 stars / hour

Multi-head Temporal Latent Attention

d-keqi/mlta 19 May 2025

While Transformer self-attention offers strong parallelism, the Key-Value (KV) cache grows linearly with sequence length and becomes a bottleneck for inference efficiency.

speech-recognition Speech Recognition +1

558
0.32 stars / hour