Search Results for author: Yuguang Yang

Found 16 papers, 5 papers with code

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

no code implementations • 28 Sep 2023 • Xiang Lyu, Yuhang Cao, Qing Wang, JingJing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts.

Action Detection Activity Detection +3

Paper
Add Code

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

no code implementations • 17 Sep 2023 • Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie

In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.

Voice Conversion

Paper
Add Code

MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition

no code implementations • 8 Aug 2023 • Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, JingJing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao

Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world.

Attribute Cross-corpus +2

Paper
Add Code

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition

no code implementations • 13 Jun 2023 • Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao

Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER).

Attribute Contrastive Learning +3

Paper
Add Code

Self-Enhancement Improves Text-Image Retrieval in Foundation Visual-Language Models

1 code implementation • 11 Jun 2023 • Yuguang Yang, Yiming Wang, Shupeng Geng, Runqi Wang, Yimi Wang, Sheng Wu, Baochang Zhang

The emergence of cross-modal foundation models has introduced numerous approaches grounded in text-image retrieval.

Attribute Image Retrieval +2

Paper
Code

Long term 5G network traffic forecasting via modeling non-stationarity with deep learning

1 code implementation • journal 2023 • Yuguang Yang, Shupeng Geng, Baochang Zhang, Juan Zhang, Zheng Wang, Yong Zhang & David Doermann

However, long term prediction horizon exposes the non-stationarity of series data, which deteriorates the performance of existing approaches.

Time Series Time Series Forecasting +1

Paper
Code

Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map

no code implementations • 27 May 2023 • Yuguang Yang, Runtang Guo, Sheng Wu, Yimi Wang, Juan Zhang, Xuan Gong, Baochang Zhang

Although the Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location, it fails to provide insight into the salient features used by the model to make decisions.

Decision Making

Paper
Add Code

HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism

1 code implementation • 15 Mar 2023 • Yuguang Yang, Yu Pan, JingJing Yin, Jiangyu Han, Lei Ma, Heng Lu

SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition

1 code implementation • 5 Dec 2022 • Yuguang Yang, Yu Pan, JingJing Yin, Heng Lu

This paper proposes a Learnable Multiplicative absolute position Embedding based Conformer (LMEC).

Position speech-recognition +1

Paper
Code

Improving fairness in speaker verification via Group-adapted Fusion Network

1 code implementation • 23 Feb 2022 • Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke

This is observed especially with underrepresented demographic groups sharing similar voice characteristics.

Fairness Speaker Recognition +1

Paper
Code

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

no code implementations • 7 Feb 2022 • Metehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow

Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication.

Contrastive Learning Speaker Recognition

Paper
Add Code

ASR-Aware End-to-end Neural Diarization

no code implementations • 2 Feb 2022 • Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Fast query-by-example speech search using separable model

no code implementations • 18 Sep 2021 • Yuguang Yang, Yu Pan, Xin Dong, Minqiang Xu

Second, we design a novel model inference scheme based on RepVGG which can efficiently improve the QbE search quality.

Word Embeddings

Paper
Add Code

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

no code implementations • 6 Sep 2021 • Zhenning Tan, Yuguang Yang, Eunjung Han, Andreas Stolcke

Second, a scoring function is applied between a runtime utterance and each speaker profile.

Speaker Identification

Paper
Add Code

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control

no code implementations • 25 Nov 2019 • Yuguang Yang

SDQL exploits the linear stage structure by approximating the Q function via a collection of deep Q sub-networks stacking along an axis marking the stage-wise progress of the whole task.

Q-Learning reinforcement-learning +1

Paper
Add Code

Efficient Navigation of Colloidal Robots in an Unknown Environment via Deep Reinforcement Learning

no code implementations • 26 Jun 2019 • Yuguang Yang, Michael A. Bevan, Bo Li

Equipping active colloidal robots with intelligence such that they can efficiently navigate in unknown complex environments could dramatically impact their use in emerging applications like precision surgery and targeted drug delivery.

Navigate Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.