Search Results for author: Zhen Ye

Found 18 papers, 11 papers with code

Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method

no code implementations20 May 2025 Xinshen Zhang, Zhen Ye, Xu Zheng

Extensive experiments on our OmniVQA demonstrate the superiority of our proposed method in omnidirectional space (+6% improvement).

Hallucination Object Localization +2

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge

no code implementations17 May 2025 Chi-Min Chan, Chunpu Xu, Jiaming Ji, Zhen Ye, Pengcheng Wen, Chunyang Jiang, Yaodong Yang, Wei Xue, Sirui Han, Yike Guo

The current focus of AI research is shifting from emphasizing model training towards enhancing evaluation quality, a transition that is crucial for driving further advancements in AI systems.

Reinforcement Learning (RL)

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

1 code implementation6 Feb 2025 Zhen Ye, Xinfa Zhu, Chi-Min Chan, Xinsheng Wang, Xu Tan, Jiahe Lei, Yi Peng, Haohe Liu, Yizhu Jin, Zheqi Dai, Hongzhan Lin, Jianyi Chen, Xingjian Du, Liumeng Xue, Yunlin Chen, Zhifei Li, Lei Xie, Qiuqiang Kong, Yike Guo, Wei Xue

Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time compute.

Speech Synthesis

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

1 code implementation28 Nov 2024 Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma

By integrating visual elements and embedded programming logic, ScratchEval requires the model to process both visual information and code structure, thereby comprehensively evaluating its programming intent understanding ability.

Code Generation

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

no code implementations14 Oct 2024 Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo

However, when it comes to stereo audio generation, the soundscapes often have a complex scene of multiple objects and directions.

Audio Generation multimodal generation

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

1 code implementation30 Aug 2024 Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

By enhancing the semantic ability of the codec, X-Codec significantly reduces WER in speech synthesis tasks and extends these benefits to non-speech applications, including music and sound generation.

Audio Compression Audio Generation +6

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models

1 code implementation17 Jun 2024 Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma

Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning.

Benchmarking Fact Checking +5

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

no code implementations13 May 2024 Jianyi Chen, Wei Xue, Xu Tan, Zhen Ye, Qifeng Liu, Yike Guo

By intensive experimental studies, we demonstrate that the proposed method can generate better samples than SingSong, and accelerate the generation by at least 30 times.

Rhythm

FlashSpeech: Efficient Zero-Shot Speech Synthesis

1 code implementation23 Apr 2024 Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Wei Xue, Qifeng Liu, Yike Guo

The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation.

Rhythm Speech Synthesis +1

CoMoSVC: Consistency Model-based Singing Voice Conversion

1 code implementation3 Jan 2024 Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo

The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre.

model Voice Conversion

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation

no code implementations22 May 2023 Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo

Since expert knowledge is hard to acquire, it hinders the flexibility to quickly design and tune digital synthesizers for diverse sounds.

Neural Architecture Search

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

1 code implementation11 May 2023 Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo

In this paper, we propose a "Co"nsistency "Mo"del-based "Speech" synthesis method, CoMoSpeech, which achieve speech synthesis through a single diffusion sampling step while achieving high audio quality.

Denoising Singing Voice Synthesis +3

Pairwise Point Cloud Registration using Graph Matching and Rotation-invariant Features

no code implementations5 May 2021 Rong Huang, Wei Yao, Yusheng Xu, Zhen Ye, Uwe Stilla

Registration is a fundamental but critical task in point cloud processing, which usually depends on finding element correspondence from two point clouds.

Graph Matching Point Cloud Registration +1

BLVD: Building A Large-scale 5D Semantics Benchmark for Autonomous Driving

1 code implementation15 Mar 2019 Jianru Xue, Jianwu Fang, Tao Li, Bohua Zhang, Pu Zhang, Zhen Ye, Jian Dou

Instead, BLVD aims to provide a platform for the tasks of dynamic 4D (3D+temporal) tracking, 5D (4D+interactive) interactive event recognition and intention prediction.

Autonomous Driving Instance Segmentation +5

Cannot find the paper you are looking for? You can Submit a new open access paper.