Search Results for author: Ziyang Ma

Found 26 papers, 11 papers with code

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

no code implementations • 9 Apr 2024 • Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, HUI ZHANG, Xie Chen, Kai Yu

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

MuPT: A Generative Symbolic Music Pretrained Transformer

no code implementations • 9 Apr 2024 • Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang

In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music.

Music Generation Music Modeling

Paper
Add Code

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

no code implementations • 5 Apr 2024 • Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs.

Language Modelling Large Language Model

Paper
Add Code

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

1 code implementation • 29 Mar 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.

Self-Supervised Learning speaker-diarization +3

711

Paper
Code

ChatMusician: Understanding and Generating Music Intrinsically with LLM

1 code implementation • 25 Feb 2024 • Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo

It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language.

Text Generation

148

Paper
Code

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

BAT: Learning to Reason about Spatial Sounds with Large Language Models

no code implementations • 2 Feb 2024 • Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath

By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment.

Event Detection Language Modelling +5

Paper
Add Code

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code implementations • 14 Jan 2024 • Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Audio Generation Language Modelling

Paper
Add Code

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

1 code implementation • 7 Jan 2024 • Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress.

Self-Supervised Learning

Paper
Code

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

2 code implementations • 23 Dec 2023 • Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,321

Paper
Code

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

Audio captioning Automatic Speech Recognition +11

276

Paper
Code

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

1 code implementation • 25 Sep 2023 • Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen

Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks.

Representation Learning Self-Supervised Learning +2

Paper
Code

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

no code implementations • 19 Sep 2023 • Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.

Data Augmentation Language Modelling +5

Paper
Add Code

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

1 code implementation • 14 Sep 2023 • Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques.

Self-Supervised Learning speech-recognition +2

775

Paper
Code

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

no code implementations • 10 Sep 2023 • Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu

Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency.

Paper
Add Code

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

no code implementations • 28 Aug 2023 • Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen

In recent years, speech-based self-supervised learning (SSL) has made significant progress in various tasks, including automatic speech recognition (ASR).

Active Learning Automatic Speech Recognition +3

Paper
Add Code

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

1 code implementation • 15 Jun 2023 • Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen

Our models outperform other SSL models significantly on the LibriSpeech benchmark without the need for iterative re-clustering and re-training.

Ranked #1 on Automatic Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Clustering +4

Paper
Code

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

no code implementations • 14 Jun 2023 • Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

LTCR: Long-Text Chinese Rumor Detection Dataset

1 code implementation • 12 Jun 2023 • Ziyang Ma, Mengsha Liu, Guian Fang, Ying Shen

False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events.

Fake News Detection Misinformation

Paper
Code

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

no code implementations • 18 Feb 2023 • Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng

However, the training of SSL models is computationally expensive and a common practice is to fine-tune a released SSL model on the specific task.

Self-Supervised Learning speech-recognition +1

Paper
Add Code

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

1 code implementation • 14 Nov 2022 • Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen

In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained.

Ranked #40 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Multi-Task Learning +3

Paper
Code

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

no code implementations • 27 Oct 2022 • Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Hierarchical Deep Residual Reasoning for Temporal Moment Localization

1 code implementation • 31 Oct 2021 • Ziyang Ma, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie

Temporal Moment Localization (TML) in untrimmed videos is a challenging task in the field of multimedia, which aims at localizing the start and end points of the activity in the video, described by a sentence query.

Language-Based Temporal Localization Sentence

Paper
Code

Video Super-Resolution via Deep Draft-Ensemble Learning

no code implementations • ICCV 2015 • Renjie Liao, Xin Tao, Ruiyu Li, Ziyang Ma, Jiaya Jia

We propose a new direction for fast video super-resolution (VideoSR) via a SR draft ensemble, which is defined as the set of high-resolution patch candidates before final image deconvolution.

Ensemble Learning Image Deconvolution +1

Paper
Add Code

Handling Motion Blur in Multi-Frame Super-Resolution

no code implementations • CVPR 2015 • Ziyang Ma, Renjie Liao, Xin Tao, Li Xu, Jiaya Jia, Enhua Wu

Ubiquitous motion blur easily fails multi-frame super-resolution (MFSR).

Image Reconstruction Multi-Frame Super-Resolution

Paper
Add Code

Bounded-Distortion Metric Learning

no code implementations • 10 May 2015 • Renjie Liao, Jianping Shi, Ziyang Ma, Jun Zhu, Jiaya Jia

Metric learning aims to embed one metric space into another to benefit tasks like classification and clustering.

Clustering General Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.