Search Results for author: Ziyang Ma

Found 26 papers, 11 papers with code

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

no code implementations9 Apr 2024 Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, HUI ZHANG, Xie Chen, Kai Yu

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

no code implementations5 Apr 2024 Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs.

Language Modelling Large Language Model

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations13 Feb 2024 Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

BAT: Learning to Reason about Spatial Sounds with Large Language Models

no code implementations2 Feb 2024 Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath

By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment.

Event Detection Language Modelling +5

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code implementations14 Jan 2024 Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Audio Generation Language Modelling

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

1 code implementation7 Jan 2024 Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress.

Self-Supervised Learning

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

2 code implementations23 Dec 2023 Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

no code implementations19 Sep 2023 Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.

Data Augmentation Language Modelling +5

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

1 code implementation14 Sep 2023 Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques.

Self-Supervised Learning speech-recognition +2

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

no code implementations10 Sep 2023 Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu

Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency.

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

no code implementations28 Aug 2023 Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen

In recent years, speech-based self-supervised learning (SSL) has made significant progress in various tasks, including automatic speech recognition (ASR).

Active Learning Automatic Speech Recognition +3

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

1 code implementation15 Jun 2023 Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen

Our models outperform other SSL models significantly on the LibriSpeech benchmark without the need for iterative re-clustering and re-training.

Automatic Speech Recognition Clustering +4

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

no code implementations14 Jun 2023 Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

LTCR: Long-Text Chinese Rumor Detection Dataset

1 code implementation12 Jun 2023 Ziyang Ma, Mengsha Liu, Guian Fang, Ying Shen

False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events.

Fake News Detection Misinformation

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

no code implementations18 Feb 2023 Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng

However, the training of SSL models is computationally expensive and a common practice is to fine-tune a released SSL model on the specific task.

Self-Supervised Learning speech-recognition +1

Hierarchical Deep Residual Reasoning for Temporal Moment Localization

1 code implementation31 Oct 2021 Ziyang Ma, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie

Temporal Moment Localization (TML) in untrimmed videos is a challenging task in the field of multimedia, which aims at localizing the start and end points of the activity in the video, described by a sentence query.

Language-Based Temporal Localization Sentence

Video Super-Resolution via Deep Draft-Ensemble Learning

no code implementations ICCV 2015 Renjie Liao, Xin Tao, Ruiyu Li, Ziyang Ma, Jiaya Jia

We propose a new direction for fast video super-resolution (VideoSR) via a SR draft ensemble, which is defined as the set of high-resolution patch candidates before final image deconvolution.

Ensemble Learning Image Deconvolution +1

Bounded-Distortion Metric Learning

no code implementations10 May 2015 Renjie Liao, Jianping Shi, Ziyang Ma, Jun Zhu, Jiaya Jia

Metric learning aims to embed one metric space into another to benefit tasks like classification and clustering.

Clustering General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.