Search Results for author: Xiaofei Wang

Found 67 papers, 13 papers with code

Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems

no code implementations • 20 Apr 2024 • Xiaofei Wang, Yunfeng Zhao, Chao Qiu, QinGhua Hu, Victor C. M. Leung

In response to these issues, this paper introduces socialized learning (SL) as a promising solution, further propelling the advancement of EI.

Paper
Add Code

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

no code implementations • 19 Apr 2024 • Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li

However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression.

Paper
Add Code

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

no code implementations • 10 Apr 2024 • Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers.

Dialogue Generation

Paper
Add Code

Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection

no code implementations • 12 Mar 2024 • Jiahui Fu, Chen Gao, Zitian Wang, Lirong Yang, Xiaofei Wang, Beipeng Mu, Si Liu

Recent 3D object detectors typically utilize multi-sensor data and unify multi-modal features in the shared bird's-eye view (BEV) representation space.

3D Object Detection object-detection

Paper
Add Code

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

no code implementations • 12 Feb 2024 • Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression.

Paper
Add Code

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

no code implementations • 16 Jan 2024 • Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics.

Automatic Speech Recognition Benchmarking +4

Paper
Add Code

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code implementations • 14 Jan 2024 • Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Audio Generation Language Modelling

Paper
Add Code

Convolutional Neural Networks for Segmentation of Malignant Pleural Mesothelioma: Analysis of Probability Map Thresholds (CALGB 30901, Alliance)

no code implementations • 30 Nov 2023 • Mena Shenouda, Eyjólfur Gudmundsson, Feng Li, Christopher M. Straus, Hedy L. Kindler, Arkadiusz Z. Dudek, Thomas Stinchcombe, Xiaofei Wang, Adam Starkey, Samuel G. Armato III

Percent difference of tumor volume and overlap using the Dice Similarity Coefficient (DSC) were compared between the standard reference provided by the radiologist and CNN outputs for thresholds ranging from 0. 001 to 0. 9.

Computed Tomography (CT)

Paper
Add Code

Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion

1 code implementation • 25 Oct 2023 • Fan Yang, Xiaofei Wang

To address this issue, we proposed a method for extending the spatio-temporal behavior dataset in Student Classroom Scenarios (SCB-ST-Dataset4) through image dataset.

Paper
Code

DiariST: Streaming Speech Translation with Speaker Diarization

1 code implementation • 14 Sep 2023 • Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

speaker-diarization Speaker Diarization +3

Paper
Code

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations • 14 Aug 2023 • Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

Paper
Add Code

A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking

no code implementations • 21 Jun 2023 • Shaohui Mei, Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Lap-Pui Chau

Surprisingly, there has been a lack of comprehensive studies on the robustness of RS tasks, prompting us to undertake a thorough survey and benchmark on the robustness of image classification and object detection in RS.

Adversarial Robustness Benchmarking +3

Paper
Add Code

One for All: Unified Workload Prediction for Dynamic Multi-tenant Edge Cloud Platforms

1 code implementation • 2 Jun 2023 • Shaoyuan Huang, Zheng Wang, Heng Zhang, Xiaofei Wang, Cheng Zhang, Wenyu Wang

In this paper, we propose an end-to-end framework with global pooling and static content awareness, DynEformer, to provide a unified workload prediction scheme for dynamic MT-ECP.

Time Series Time Series Prediction

Paper
Code

When Computing Power Network Meets Distributed Machine Learning: An Efficient Federated Split Learning Framework

no code implementations • 22 May 2023 • Xinjing Yuan, Lingjun Pu, Lei Jiao, Xiaofei Wang, Meijuan Yang, Jingdong Xu

In this paper, we advocate CPN-FedSL, a novel and flexible Federated Split Learning (FedSL) framework over Computing Power Network (CPN).

Scheduling

Paper
Add Code

Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion

1 code implementation • 13 May 2023 • Fan Yang, Tao Wang, Xiaofei Wang

We constructed a dataset, which contained 11, 248 labels and 4, 001 images, with an emphasis on the common behavior of raising hands in a classroom setting (Student Classroom Behavior dataset, SCB-Dataset).

Paper
Code

Multi-task Learning of Histology and Molecular Markers for Classifying Diffuse Glioma

no code implementations • 26 Mar 2023 • Xiaofei Wang, Stephen Price, Chao Li

This paper presents a first attempt to jointly predict molecular markers and histology features and model their interactions for classifying diffuse glioma bases on whole slide images.

Multi-Task Learning whole slide images

Paper
Add Code

CBA: Contextual Background Attack against Optical Aerial Detection in the Physical World

1 code implementation • 27 Feb 2023 • Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei

To further strengthen the attack performance, the adversarial patches are forced to be outside targets during training, by which the detected objects of interest, both on and outside patches, benefit the accumulation of attack efficacy.

Adversarial Robustness

Paper
Code

Contextual adversarial attack against aerial detection in the physical world

no code implementations • 27 Feb 2023 • Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei

We propose an innovative contextual attack method against aerial detection in real scenarios, which achieves powerful attack performance and transfers well between various aerial object detectors without smearing or blocking the interested objects to hide.

Adversarial Attack Blocking

Paper
Add Code

When Quantum Information Technologies Meet Blockchain in Web 3.0

no code implementations • 29 Nov 2022 • Minrui Xu, Xiaoxu Ren, Dusit Niyato, Jiawen Kang, Chao Qiu, Zehui Xiong, Xiaofei Wang, Victor C. M. Leung

Therefore, in this paper, we introduce a quantum blockchain-driven Web 3. 0 framework that provides information-theoretic security for decentralized data transferring and payment transactions.

Cloud Computing

Paper
Add Code

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

no code implementations • 11 Nov 2022 • Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka

Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

no code implementations • 10 Nov 2022 • Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang

In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion.

Representation Learning Self-Supervised Learning +2

Paper
Add Code

Speech separation with large-scale self-supervised learning

no code implementations • 9 Nov 2022 • Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez

Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.

Self-Supervised Learning Speech Separation

Paper
Add Code

Simulating realistic speech overlaps improves multi-talker ASR

no code implementations • 27 Oct 2022 • Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

no code implementations • 12 Sep 2022 • Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka

To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

DLMP of Competitive Markets in Active Distribution Networks: Models, Solutions, Applications, and Visions

no code implementations • 27 May 2022 • Xiaofei Wang, Fangxing Li, Linquan Bai, Xin Fang

The DLMP provides a solution that can be essential for competitive market operation in future distribution systems.

Paper
Add Code

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

no code implementations • 7 Apr 2022 • Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka

In this paper, we propose a three-stage training scheme for the CSS model that can leverage both supervised data and extra large-scale unsupervised real-world conversational data.

Speech Separation

Paper
Add Code

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

1 code implementation • 30 Mar 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

1 code implementation • 2 Feb 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

no code implementations • 24 Jan 2022 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang

Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions.

speech-recognition Speech Recognition

Paper
Add Code

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

no code implementations • 28 Oct 2021 • Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

The reconstruction module is used for auxiliary learning to improve the noise robustness of the learned representation and thus is not required during inference.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Paper
Add Code

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

no code implementations • 18 Oct 2021 • Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang

Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.

Speech Enhancement speech-recognition +1

Paper
Add Code

All-neural beamformer for continuous speech separation

no code implementations • 13 Oct 2021 • Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations • 12 Oct 2021 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Paper
Add Code

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

no code implementations • 7 Oct 2021 • Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.

Action Detection Activity Detection +6

Paper
Add Code

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

no code implementations • 11 Aug 2021 • Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.

Paper
Add Code

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

no code implementations • 6 Jul 2021 • Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Saliency-Guided Image Translation

no code implementations • CVPR 2021 • Lai Jiang, Mai Xu, Xiaofei Wang, Leonid Sigal

In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map.

Generative Adversarial Network Image-to-Image Translation +1

Paper
Add Code

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement

no code implementations • 5 Jun 2021 • Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka

Performance analysis is also carried out by changing the ASR model, the data used for the ASR-step, and the schedule of the two update steps.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

End-to-End Speaker-Attributed ASR with Transformer

no code implementations • 5 Apr 2021 • Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

no code implementations • 31 Mar 2021 • Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Continuous Speech Separation with Ad Hoc Microphone Arrays

no code implementations • 3 Mar 2021 • Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

speech-recognition Speech Recognition +1

Paper
Add Code

Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

no code implementations • 17 Jan 2021 • Yiwen Han, Shihao Shen, Xiaofei Wang, Shiqiang Wang, Victor C. M. Leung

In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing.

Scheduling

Paper
Add Code

EC-SAGINs: Edge Computing-enhanced Space-Air-Ground Integrated Networks for Internet of Vehicles

no code implementations • 15 Jan 2021 • Shuai Yu, Xiaowen Gong, Qian Shi, Xiaofei Wang, Xu Chen

After discussing several existing orbital and aerial edge computing architectures, we propose a framework of edge computing-enabled space-air-ground integrated networks (EC-SAGINs) to support various IoV services for the vehicles in remote areas.

Edge-computing Imitation Learning +1

Paper
Add Code

A Systematic Review of the Efforts and Hindrances of Modeling and Simulation of CAR T-cell Therapy

no code implementations • 13 Jan 2021 • Ujwani Nukala, Marisabel Rodriguez Messan, Osman N. Yogurtcu, Xiaofei Wang, Hong Yang

Chimeric Antigen Receptor (CAR) T-cell therapy is an immunotherapy that has recently become highly instrumental in the fight against life-threatening diseases.

Paper
Add Code

Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings

no code implementations • 6 Jan 2021 • Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka

Then, we propose a novel method using a sequence-to-sequence model, called hypothesis stitcher.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Reinforcement Learning with Latent Flow

2 code implementations • NeurIPS 2021 • Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin

Temporal information is essential to learning effective policies with Reinforcement Learning (RL).

Ranked #1 on Montezuma's Revenge on Atari 2600 Montezuma's Revenge

Continuous Control Montezuma's Revenge +4

Paper
Code

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

1 code implementation • 3 Nov 2020 • Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

OCT-GAN: Single Step Shadow and Noise Removal from Optical Coherence Tomography Images of the Human Optic Nerve Head

no code implementations • 6 Oct 2020 • Haris Cheong, Sripad Krishna Devalla, Thanadet Chuangsuwanich, Tin A. Tun, Xiaofei Wang, Tin Aung, Leopold Schmetterer, Martin L. Buist, Craig Boote, Alexandre H. Thiéry, Michaël J. A. Girard

Speckle noise and retinal shadows within OCT B-scans occlude important edges, fine textures and deep tissues, preventing accurate and robust diagnosis by algorithms and clinicians.

SSIM

Paper
Add Code

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

1 code implementation • 11 Aug 2020 • Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

no code implementations • 19 Jun 2020 • Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka

We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Federated Learning for 6G Communications: Challenges, Methods, and Future Directions

no code implementations • 4 Jun 2020 • Yi Liu, Xingliang Yuan, Zehui Xiong, Jiawen Kang, Xiaofei Wang, Dusit Niyato

As the 5G communication networks are being widely deployed worldwide, both industry and academia have started to move beyond 5G and explore 6G communications.

Federated Learning

Paper
Add Code

Serialized Output Training for End-to-End Overlapped Speech Recognition

no code implementations • 28 Mar 2020 • Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka

We also show that the SOT models can accurately count the number of speakers in the input audio.

speech-recognition Speech Recognition

Paper
Add Code

Global Adaptive Generative Adjustment

no code implementations • 2 Nov 2019 • Bin Wang, Xiaofei Wang, Jianhua Guo

Many traditional signal recovery approaches can behave well basing on the penalized likelihood.

Computational Efficiency Model Selection

Paper
Add Code

A practical two-stage training strategy for multi-stream end-to-end speech recognition

no code implementations • 23 Oct 2019 • Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

DeshadowGAN: A Deep Learning Approach to Remove Shadows from Optical Coherence Tomography Images

no code implementations • 7 Oct 2019 • Haris Cheong, Sripad Krishna Devalla, Tan Hung Pham, Zhang Liang, Tin Aung Tun, Xiaofei Wang, Shamira Perera, Leopold Schmetterer, Aung Tin, Craig Boote, Alexandre H. Thiery, Michael J. A. Girard

Image quality was assessed qualitatively (for artifacts) and quantitatively using the intralayer contrast: a measure of shadow visibility ranging from 0 (shadow-free) to 1 (strong shadow) and compared to compensated images.

Denoising Generative Adversarial Network +3

Paper
Add Code

A Comparative Study on Transformer vs RNN in Speech Applications

1 code implementation • 13 Sep 2019 • Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang

Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS).

Ranked #12 on Speech Recognition on AISHELL-1

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

7,867

Paper
Code

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

1 code implementation • 19 Jul 2019 • Xiaofei Wang, Yiwen Han, Victor C. M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen

Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network.

Cloud Computing Edge-computing +2

Paper
Code

Multi-Stream End-to-End Speech Recognition

no code implementations • 17 Jun 2019 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

no code implementations • 19 Apr 2019 • Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita

This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Exploring Methods for the Automatic Detection of Errors in Manual Transcription

no code implementations • 8 Apr 2019 • Xiaofei Wang, Jinyi Yang, Ruizhi Li, Samik Sadhu, Hynek Hermansky

Quality of data plays an important role in most deep learning tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Attention Based Glaucoma Detection: A Large-scale Database and CNN Model

1 code implementation • CVPR 2019 • Liu Li, Mai Xu, Xiaofei Wang, Lai Jiang, Hanruo Liu

The attention maps of the ophthalmologists are also collected in LAG database through a simulated eye-tracking experiment.

Paper
Code

Stream attention-based multi-array end-to-end speech recognition

no code implementations • 12 Nov 2018 • Xiaofei Wang, Ruizhi Li, Sri Harish Mallid, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-encoder multi-resolution framework for end-to-end speech recognition

no code implementations • 12 Nov 2018 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Deep Learning Approach to Denoise Optical Coherence Tomography Images of the Optic Nerve Head

no code implementations • 27 Sep 2018 • Sripad Krishna Devalla, Giridhar Subramanian, Tan Hung Pham, Xiaofei Wang, Shamira Perera, Tin A. Tun, Tin Aung, Leopold Schmetterer, Alexandre H. Thiery, Michael J. A. Girard

For all the ONH tissues, the mean CNR increased from $3. 50 \pm 0. 56$ (single-frame) to $7. 63 \pm 1. 81$ (denoised).

Paper
Add Code

In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning

no code implementations • 19 Sep 2018 • Xiaofei Wang, Yiwen Han, Chenyang Wang, Qiyang Zhao, Xu Chen, Min Chen

In order to bring more intelligence to the edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge systems, for optimizing the mobile edge computing, caching and communication.

Edge-computing Federated Learning

Paper
Add Code

ABtree: An Algorithm for Subgroup-Based Treatment Assignment

no code implementations • 13 May 2016 • Derek Feng, Xiaofei Wang

Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other.

Marketing

Paper
Add Code

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

no code implementations • 21 Sep 2015 • Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong liu, Xu Li, Qiang Fu, Yonghong Yan

This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.