Search Results for author: Peidong Wang

Found 32 papers, 8 papers with code

PHRASED: Phrase Dictionary Biasing for Speech Translation

no code implementations10 Jun 2025 Peidong Wang, Jian Xue, Rui Zhao, Junkun Chen, Aswin Shanmugam Subramanian, Jinyu Li

We apply the phrase dictionary biasing method to two types of widely adopted models, a transducer-based streaming speech translation model and a multimodal large language model.

Language Modeling Language Modelling +3

AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation

1 code implementation31 May 2025 Ming Wang, Peidong Wang, Lin Wu, Xiaocui Yang, Daling Wang, Shi Feng, Yuxin Chen, Bixuan Wang, Yifei Zhang

Constrained by the cost and ethical concerns of involving real seekers in AI-driven mental health, researchers develop LLM-based conversational agents (CAs) with tailored configurations, such as profiles, symptoms, and scenarios, to simulate seekers.

TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection

1 code implementation31 Mar 2025 ZhiMing Ma, Peidong Wang, Minhua Huang, Jingpeng Wang, Kai Wu, Xiangzhao Lv, Yachun Pang, Yin Yang, Wenjie Tang, Yuchen Kang

The detection of telecom fraud faces significant challenges due to the lack of high-quality multimodal training data that integrates audio signals with reasoning-oriented textual analysis.

Fraud Detection Large Language Model +4

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

1 code implementation12 Feb 2025 ZhiMing Ma, Xiayang Xiao, Sihao Dong, Peidong Wang, Haipeng Wang, Qingyun Pan

As a powerful all-weather Earth observation tool, synthetic aperture radar (SAR) remote sensing enables critical military reconnaissance, maritime surveillance, and infrastructure monitoring.

Earth Observation object-detection +1

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

no code implementations4 Feb 2025 Peidong Wang, Naoyuki Kanda, Jian Xue, Jinyu Li, Xiaofei Wang, Aswin Shanmugam Subramanian, Junkun Chen, Sunit Sivasankaran, Xiong Xiao, Yong Zhao

We propose to tackle streaming speaker change detection and gender classification by incorporating speaker embeddings into a transducer-based streaming end-to-end speech translation model.

Change Detection Gender Classification +3

Language Models as Continuous Self-Evolving Data Engineers

no code implementations19 Dec 2024 Peidong Wang, Ming Wang, ZhiMing Ma, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang

Large Language Models (LLMs) have demonstrated remarkable capabilities on various tasks, while the further evolvement is limited to the lack of high-quality training data.

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

no code implementations17 Oct 2024 Sreyan Ghosh, Mohammad Sadegh Rasooli, Michael Levit, Peidong Wang, Jian Xue, Dinesh Manocha, Jinyu Li

To address these issues, we propose DARAG (Data- and Retrieval-Augmented Generative Error Correction), a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers

no code implementations21 Aug 2024 Prashant Serai, Peidong Wang, Eric Fosler-Lussier

In this work, we extend a prior phonetic confusion based model for predicting speech recognition errors in two ways: first, we introduce a sampling-based paradigm that better simulates the behavior of a posterior-based acoustic model.

Language Modeling Language Modelling +2

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

no code implementations12 Jun 2024 Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam Subramanian

Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language.

Language Identification

STICKERCONV: Generating Multimodal Empathetic Responses from Scratch

1 code implementation20 Jan 2024 Yiqun Zhang, Fanheng Kong, Peidong Wang, Shuang Sun, Lingshuai Wang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song

Stickers, while widely recognized for enhancing empathetic communication in online interactions, remain underexplored in current empathetic dialogue research, notably due to the challenge of a lack of comprehensive datasets.

2k Empathetic Response Generation +1

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

no code implementations23 Oct 2023 Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

DiariST: Streaming Speech Translation with Speaker Diarization

1 code implementation14 Sep 2023 Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

speaker-diarization Speaker Diarization +3

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

no code implementations7 Jul 2023 Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

no code implementations1 Mar 2023 Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.

Language Identification

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

no code implementations4 Nov 2022 Jian Xue, Peidong Wang, Jinyu Li, Eric Sun

In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language.

Machine Translation speech-recognition +2

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

no code implementations1 Mar 2022 Yufeng Yang, Peidong Wang, DeLiang Wang

The proposed model builds on the wide residual bi-directional long short-term memory network (WRBN) with utterance-wise dropout and iterative speaker adaptation, but employs a Conformer encoder instead of the recurrent network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Predicting Atlantic Multidecadal Variability

no code implementations29 Oct 2021 Glenn Liu, Peidong Wang, Matthew Beveridge, Young-Oh Kwon, Iddo Drori

Atlantic Multidecadal Variability (AMV) describes variations of North Atlantic sea surface temperature with a typical cycle of between 60 and 70 years.

Continuous Speech Separation with Recurrent Selective Attention Network

no code implementations28 Oct 2021 Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.

speech-recognition Speech Recognition +1

FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters

1 code implementation ICCV 2021 Yuwei Cheng, Jiannan Zhu, Mengxin Jiang, Jie Fu, Changsong Pang, Peidong Wang, Kris Sankaran, Olawale Onabola, Yimin Liu, Dianbo Liu, Yoshua Bengio

To promote the practical application for autonomous floating wastes cleaning, we present FloW, the first dataset for floating waste detection in inland water areas.

object-detection Robust Object Detection

Multitask Training with Text Data for End-to-End Speech Recognition

no code implementations27 Oct 2020 Peidong Wang, Tara N. Sainath, Ron J. Weiss

We propose a multitask training method for attention-based end-to-end speech recognition models.

Decoder Language Modeling +3

Speaker Separation Using Speaker Inventories and Estimated Speech

no code implementations20 Oct 2020 Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.

Speaker Separation Speech Extraction +2

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation

2 code implementations4 Oct 2020 Zhong-Qiu Wang, Peidong Wang, DeLiang Wang

Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry.

Speaker Separation Speech Separation

Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling

no code implementations11 Mar 2019 Peidong Wang, Ke Tan, DeLiang Wang

In this study, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Recurrent Deep Stacking Networks for Speech Recognition

no code implementations14 Dec 2016 Peidong Wang, Zhongqiu Wang, DeLiang Wang

This paper presented our work on applying Recurrent Deep Stacking Networks (RDSNs) to Robust Automatic Speech Recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Incorporating Language Level Information into Acoustic Models

no code implementations14 Dec 2016 Peidong Wang, DeLiang Wang

This paper proposed a class of novel Deep Recurrent Neural Networks which can incorporate language-level information into acoustic models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.