R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

dle666/r-cot 23 Oct 2024

Current geometric data generation approaches, which apply preset templates to generate geometric data or use Large Language Models (LLMs) to rephrase questions and answers (Q&A), unavoidably limit data accuracy and diversity.

Diversity

109
0.47 stars / hour

ZIM: Zero-Shot Image Matting for Anything

naver-ai/zim 1 Nov 2024

To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations.

Image Inpainting Image Matting +2

36
0.41 stars / hour

Domain-Controlled Prompt Learning

caoql98/dcpl 30 Sep 2023

Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms, leading to suboptimal performance due to the misinterpretation of specific images in natural image patterns.

56
0.41 stars / hour

Melody Is All You Need For Music Generation

shaopengw/Awesome-Music-Generation 30 Sep 2024

We present the Melody Guided Music Generation (MMGen) model, the first novel approach using melody to guide the music generation that, despite a pretty simple method and extremely limited resources, achieves excellent performance.

cross-modal alignment Music Generation +1

84
0.40 stars / hour

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

open-mmlab/amphion 1 Sep 2024

The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive and non-autoregressive systems.

Self-Supervised Learning Text to Speech

7,301
0.39 stars / hour

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

open-mmlab/Amphion 7 Jul 2024

To facilitate the scale-up of Emilia, we also present Emilia-Pipe, the first open-source preprocessing pipeline designed to efficiently transform raw, in-the-wild speech data into high-quality training data with speech annotations.

7,288
0.39 stars / hour

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation

openspg/kag 10 Sep 2024

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.

Knowledge Graphs Question Answering +2

485
0.38 stars / hour

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

bytedance/ShadowKV 28 Oct 2024

By evaluating ShadowKV on a broad range of benchmarks, including RULER, LongBench, and Needle In A Haystack, and models like Llama-3. 1-8B, Llama-3-8B-1M, GLM-4-9B-1M, Yi-9B-200K, Phi-3-Mini-128K, and Qwen2-7B-128K, we demonstrate that it can support up to 6$\times$ larger batch sizes and boost throughput by up to 3. 04$\times$ on an A100 GPU without sacrificing accuracy, even surpassing the performance achievable with infinite batch size under the assumption of infinite GPU memory.

104
0.38 stars / hour

XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM

openxrlab/xrdslam 31 Oct 2024

In this paper, we propose a flexible SLAM framework, XRDSLAM.

Benchmarking Management

164
0.38 stars / hour

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

opendevin/opendevin 23 Jul 2024

OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web.

34,218
0.37 stars / hour