Search Results for author: Peidong Wang

Found 22 papers, 5 papers with code

Incorporating Language Level Information into Acoustic Models

no code implementations • 14 Dec 2016 • Peidong Wang, DeLiang Wang

This paper proposed a class of novel Deep Recurrent Neural Networks which can incorporate language-level information into acoustic models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Recurrent Deep Stacking Networks for Speech Recognition

no code implementations • 14 Dec 2016 • Peidong Wang, Zhongqiu Wang, DeLiang Wang

This paper presented our work on applying Recurrent Deep Stacking Networks (RDSNs) to Robust Automatic Speech Recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling

no code implementations • 11 Mar 2019 • Peidong Wang, Ke Tan, DeLiang Wang

In this study, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation

2 code implementations • 4 Oct 2020 • Zhong-Qiu Wang, Peidong Wang, DeLiang Wang

Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry.

Speaker Separation Speech Separation

Paper
Code

Speaker Separation Using Speaker Inventories and Estimated Speech

no code implementations • 20 Oct 2020 • Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.

Speaker Separation Speech Extraction +2

Paper
Add Code

Multitask Training with Text Data for End-to-End Speech Recognition

no code implementations • 27 Oct 2020 • Peidong Wang, Tara N. Sainath, Ron J. Weiss

We propose a multitask training method for attention-based end-to-end speech recognition models.

Language Modelling speech-recognition +1

Paper
Add Code

Efficient End-to-End Speech Recognition Using Performers in Conformers

no code implementations • 9 Nov 2020 • Peidong Wang, DeLiang Wang

On-device end-to-end speech recognition poses a high requirement on model efficiency.

speech-recognition Speech Recognition

Paper
Add Code

FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters

1 code implementation • ICCV 2021 • Yuwei Cheng, Jiannan Zhu, Mengxin Jiang, Jie Fu, Changsong Pang, Peidong Wang, Kris Sankaran, Olawale Onabola, Yimin Liu, Dianbo Liu, Yoshua Bengio

To promote the practical application for autonomous floating wastes cleaning, we present FloW, the first dataset for floating waste detection in inland water areas.

object-detection Robust Object Detection

Paper
Code

Continuous Speech Separation with Recurrent Selective Attention Network

no code implementations • 28 Oct 2021 • Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.

speech-recognition Speech Recognition +1

Paper
Add Code

Predicting Atlantic Multidecadal Variability

no code implementations • 29 Oct 2021 • Glenn Liu, Peidong Wang, Matthew Beveridge, Young-Oh Kwon, Iddo Drori

Atlantic Multidecadal Variability (AMV) describes variations of North Atlantic sea surface temperature with a typical cycle of between 60 and 70 years.

Paper
Add Code

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

no code implementations • 1 Mar 2022 • Yufeng Yang, Peidong Wang, DeLiang Wang

The proposed model builds on the wide residual bi-directional long short-term memory network (WRBN) with utterance-wise dropout and iterative speaker adaptation, but employs a Conformer encoder instead of the recurrent network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

1 code implementation • 11 Apr 2022 • Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur

Neural transducers have been widely used in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Paper
Add Code

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

no code implementations • 4 Nov 2022 • Jian Xue, Peidong Wang, Jinyu Li, Eric Sun

In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language.

Machine Translation speech-recognition +2

Paper
Add Code

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

no code implementations • 5 Nov 2022 • Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li

In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

no code implementations • 10 Nov 2022 • Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang

In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion.

Representation Learning Self-Supervised Learning +2

Paper
Add Code

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.

Language Identification

Paper
Add Code

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

no code implementations • 7 Jul 2023 • Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

DiariST: Streaming Speech Translation with Speaker Diarization

1 code implementation • 14 Sep 2023 • Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

speaker-diarization Speaker Diarization +3

Paper
Code

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

no code implementations • 6 Oct 2023 • Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li

Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication.

Simultaneous Speech-to-Text Translation Translation

Paper
Add Code

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

no code implementations • 23 Oct 2023 • Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

STICKERCONV: Generating Multimodal Empathetic Responses from Scratch

1 code implementation • 20 Jan 2024 • Yiqun Zhang, Fanheng Kong, Peidong Wang, Shuang Sun, Lingshuai Wang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song

Stickers, while widely recognized for enhancing empathetic communication in online interactions, remain underexplored in current empathetic dialogue research, notably due to the challenge of a lack of comprehensive datasets.

2k Empathetic Response Generation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.