Search Results for author: Xiaofei Wang

Found 38 papers, 6 papers with code

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

no code implementations28 Oct 2021 Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

The reconstruction module is used for auxiliary learning to improve the noise robustness of the learned representation and thus is not required during inference.

Auxiliary Learning Contrastive Learning +5

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

no code implementations18 Oct 2021 Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang

Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.

Speech Enhancement Speech Quality +1

All-neural beamformer for continuous speech separation

no code implementations13 Oct 2021 Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Speech Extraction Speech Recognition

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations12 Oct 2021 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

no code implementations7 Oct 2021 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.

Action Detection Activity Detection +3

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

no code implementations11 Aug 2021 Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

no code implementations6 Jul 2021 Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.

Representation Learning Speaker Diarization +2

Saliency-Guided Image Translation

no code implementations CVPR 2021 Lai Jiang, Mai Xu, Xiaofei Wang, Leonid Sigal

In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map.

Image-to-Image Translation Translation

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement

no code implementations5 Jun 2021 Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka

Performance analysis is also carried out by changing the ASR model, the data used for the ASR-step, and the schedule of the two update steps.

Speech Enhancement Speech Recognition

End-to-End Speaker-Attributed ASR with Transformer

no code implementations5 Apr 2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.

Speaker Identification Speech Recognition

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

no code implementations31 Mar 2021 Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Speech Recognition

Continuous Speech Separation with Ad Hoc Microphone Arrays

no code implementations3 Mar 2021 Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

Speech Recognition Speech Separation

Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

no code implementations17 Jan 2021 Yiwen Han, Shihao Shen, Xiaofei Wang, Shiqiang Wang, Victor C. M. Leung

In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing.

EC-SAGINs: Edge Computing-enhanced Space-Air-Ground Integrated Networks for Internet of Vehicles

no code implementations15 Jan 2021 Shuai Yu, Xiaowen Gong, Qian Shi, Xiaofei Wang, Xu Chen

After discussing several existing orbital and aerial edge computing architectures, we propose a framework of edge computing-enabled space-air-ground integrated networks (EC-SAGINs) to support various IoV services for the vehicles in remote areas.

Decision Making Edge-computing +1

A Systematic Review of the Efforts and Hindrances of Modeling and Simulation of CAR T-cell Therapy

no code implementations13 Jan 2021 Ujwani Nukala, Marisabel Rodriguez Messan, Osman N. Yogurtcu, Xiaofei Wang, Hong Yang

Chimeric Antigen Receptor (CAR) T-cell therapy is an immunotherapy that has recently become highly instrumental in the fight against life-threatening diseases.

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

1 code implementation3 Nov 2020 Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Speaker Identification Speech Recognition

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

1 code implementation11 Aug 2020 Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.

Speaker Identification Speech Recognition

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

no code implementations19 Jun 2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka

We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.

Speaker Identification Speech Recognition

Federated Learning for 6G Communications: Challenges, Methods, and Future Directions

no code implementations4 Jun 2020 Yi Liu, Xingliang Yuan, Zehui Xiong, Jiawen Kang, Xiaofei Wang, Dusit Niyato

As the 5G communication networks are being widely deployed worldwide, both industry and academia have started to move beyond 5G and explore 6G communications.

Federated Learning

Global Adaptive Generative Adjustment

no code implementations2 Nov 2019 Bin Wang, Xiaofei Wang, Jianhua Guo

Many traditional signal recovery approaches can behave well basing on the penalized likelihood.

Model Selection

A practical two-stage training strategy for multi-stream end-to-end speech recognition

no code implementations23 Oct 2019 Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion.

Speech Recognition

DeshadowGAN: A Deep Learning Approach to Remove Shadows from Optical Coherence Tomography Images

no code implementations7 Oct 2019 Haris Cheong, Sripad Krishna Devalla, Tan Hung Pham, Zhang Liang, Tin Aung Tun, Xiaofei Wang, Shamira Perera, Leopold Schmetterer, Aung Tin, Craig Boote, Alexandre H. Thiery, Michael J. A. Girard

Image quality was assessed qualitatively (for artifacts) and quantitatively using the intralayer contrast: a measure of shadow visibility ranging from 0 (shadow-free) to 1 (strong shadow) and compared to compensated images.

Denoising Semantic Segmentation +1

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

1 code implementation19 Jul 2019 Xiaofei Wang, Yiwen Han, Victor C. M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen

Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network.

Edge-computing Face Recognition

Multi-Stream End-to-End Speech Recognition

no code implementations17 Jun 2019 Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.

Speech Recognition

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

no code implementations19 Apr 2019 Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita

This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.

Denoising Noisy Speech Recognition +1

Attention Based Glaucoma Detection: A Large-scale Database and CNN Model

1 code implementation CVPR 2019 Liu Li, Mai Xu, Xiaofei Wang, Lai Jiang, Hanruo Liu

The attention maps of the ophthalmologists are also collected in LAG database through a simulated eye-tracking experiment.

Multi-encoder multi-resolution framework for end-to-end speech recognition

no code implementations12 Nov 2018 Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model.

Speech Recognition

Stream attention-based multi-array end-to-end speech recognition

no code implementations12 Nov 2018 Xiaofei Wang, Ruizhi Li, Sri Harish Mallid, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness.

Speech Recognition

In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning

no code implementations19 Sep 2018 Xiaofei Wang, Yiwen Han, Chenyang Wang, Qiyang Zhao, Xu Chen, Min Chen

In order to bring more intelligence to the edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge systems, for optimizing the mobile edge computing, caching and communication.

Edge-computing Federated Learning

ABtree: An Algorithm for Subgroup-Based Treatment Assignment

no code implementations13 May 2016 Derek Feng, Xiaofei Wang

Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other.

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

no code implementations21 Sep 2015 Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong liu, Xu Li, Qiang Fu, Yonghong Yan

This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition.

Speech Recognition Speech Separation

Cannot find the paper you are looking for? You can Submit a new open access paper.