no code implementations • 22 May 2023 • Xinjing Yuan, Lingjun Pu, Lei Jiao, Xiaofei Wang, Meijuan Yang, Jingdong Xu
In this paper, we advocate CPN-FedSL, a novel and flexible Federated Split Learning (FedSL) framework over Computing Power Network (CPN).
1 code implementation • 13 May 2023 • Fan Yang, Tao Wang, Xiaofei Wang
We constructed a dataset, which contained 11, 248 labels and 4, 001 images, with an emphasis on the common behavior of raising hands in a classroom setting (Student Classroom Behavior dataset, SCB-Dataset).
no code implementations • 26 Mar 2023 • Xiaofei Wang, Stephen Price, Chao Li
This paper presents a first attempt to jointly predict molecular markers and histology features and model their interactions for classifying diffuse glioma bases on whole slide images.
no code implementations • 27 Feb 2023 • Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei
We propose an innovative contextual attack method against aerial detection in real scenarios, which achieves powerful attack performance and transfers well between various aerial object detectors without smearing or blocking the interested objects to hide.
1 code implementation • 27 Feb 2023 • Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei
To further strengthen the attack performance, the adversarial patches are forced to be outside targets during training, by which the detected objects of interest, both on and outside patches, benefit the accumulation of attack efficacy.
no code implementations • 29 Nov 2022 • Minrui Xu, Xiaoxu Ren, Dusit Niyato, Jiawen Kang, Chao Qiu, Zehui Xiong, Xiaofei Wang, Victor C. M. Leung
Therefore, in this paper, we introduce a quantum blockchain-driven Web 3. 0 framework that provides information-theoretic security for decentralized data transferring and payment transactions.
no code implementations • 11 Nov 2022 • Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka
Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 10 Nov 2022 • Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang
In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion.
no code implementations • 9 Nov 2022 • Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez
Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.
no code implementations • 27 Oct 2022 • Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka
Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 12 Sep 2022 • Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka
To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 27 May 2022 • Xiaofei Wang, Fangxing Li, Linquan Bai, Xin Fang
The DLMP provides a solution that can be essential for competitive market operation in future distribution systems.
no code implementations • 7 Apr 2022 • Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka
In this paper, we propose a three-stage training scheme for the CSS model that can leverage both supervised data and extra large-scale unsupervised real-world conversational data.
no code implementations • 30 Mar 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka
The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 2 Feb 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka
This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 24 Jan 2022 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang
Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions.
no code implementations • 28 Oct 2021 • Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang
The reconstruction module is used for auxiliary learning to improve the noise robustness of the learned representation and thus is not required during inference.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+8
no code implementations • 18 Oct 2021 • Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang
Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.
no code implementations • 13 Oct 2021 • Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez
Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 12 Oct 2021 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda
Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.
no code implementations • 7 Oct 2021 • Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.
no code implementations • 11 Aug 2021 • Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin
A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.
no code implementations • 6 Jul 2021 • Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • CVPR 2021 • Lai Jiang, Mai Xu, Xiaofei Wang, Leonid Sigal
In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map.
no code implementations • 5 Jun 2021 • Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
Performance analysis is also carried out by changing the ASR model, the data used for the ASR-step, and the schedule of the two update steps.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 5 Apr 2021 • Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 31 Mar 2021 • Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 3 Mar 2021 • Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng
Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.
no code implementations • 17 Jan 2021 • Yiwen Han, Shihao Shen, Xiaofei Wang, Shiqiang Wang, Victor C. M. Leung
In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing.
no code implementations • 15 Jan 2021 • Shuai Yu, Xiaowen Gong, Qian Shi, Xiaofei Wang, Xu Chen
After discussing several existing orbital and aerial edge computing architectures, we propose a framework of edge computing-enabled space-air-ground integrated networks (EC-SAGINs) to support various IoV services for the vehicles in remote areas.
no code implementations • 13 Jan 2021 • Ujwani Nukala, Marisabel Rodriguez Messan, Osman N. Yogurtcu, Xiaofei Wang, Hong Yang
Chimeric Antigen Receptor (CAR) T-cell therapy is an immunotherapy that has recently become highly instrumental in the fight against life-threatening diseases.
2 code implementations • NeurIPS 2021 • Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin
Temporal information is essential to learning effective policies with Reinforcement Learning (RL).
no code implementations • 6 Jan 2021 • Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
Then, we propose a novel method using a sequence-to-sequence model, called hypothesis stitcher.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 3 Nov 2020 • Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka
Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 6 Oct 2020 • Haris Cheong, Sripad Krishna Devalla, Thanadet Chuangsuwanich, Tin A. Tun, Xiaofei Wang, Tin Aung, Leopold Schmetterer, Martin L. Buist, Craig Boote, Alexandre H. Thiéry, Michaël J. A. Girard
Speckle noise and retinal shadows within OCT B-scans occlude important edges, fine textures and deep tissues, preventing accurate and robust diagnosis by algorithms and clinicians.
1 code implementation • 11 Aug 2020 • Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 19 Jun 2020 • Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka
We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 4 Jun 2020 • Yi Liu, Xingliang Yuan, Zehui Xiong, Jiawen Kang, Xiaofei Wang, Dusit Niyato
As the 5G communication networks are being widely deployed worldwide, both industry and academia have started to move beyond 5G and explore 6G communications.
no code implementations • 28 Mar 2020 • Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
We also show that the SOT models can accurately count the number of speakers in the input audio.
no code implementations • 2 Nov 2019 • Bin Wang, Xiaofei Wang, Jianhua Guo
Many traditional signal recovery approaches can behave well basing on the penalized likelihood.
no code implementations • 23 Oct 2019 • Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky
The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 7 Oct 2019 • Haris Cheong, Sripad Krishna Devalla, Tan Hung Pham, Zhang Liang, Tin Aung Tun, Xiaofei Wang, Shamira Perera, Leopold Schmetterer, Aung Tin, Craig Boote, Alexandre H. Thiery, Michael J. A. Girard
Image quality was assessed qualitatively (for artifacts) and quantitatively using the intralayer contrast: a measure of shadow visibility ranging from 0 (shadow-free) to 1 (strong shadow) and compared to compensated images.
1 code implementation • 13 Sep 2019 • Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang
Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS).
Ranked #7 on
Speech Recognition
on AISHELL-1
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 19 Jul 2019 • Xiaofei Wang, Yiwen Han, Victor C. M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen
Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network.
no code implementations • 17 Jun 2019 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky
Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 19 Apr 2019 • Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita
This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 8 Apr 2019 • Xiaofei Wang, Jinyi Yang, Ruizhi Li, Samik Sadhu, Hynek Hermansky
Quality of data plays an important role in most deep learning tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • CVPR 2019 • Liu Li, Mai Xu, Xiaofei Wang, Lai Jiang, Hanruo Liu
The attention maps of the ophthalmologists are also collected in LAG database through a simulated eye-tracking experiment.
no code implementations • 12 Nov 2018 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky
In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 12 Nov 2018 • Xiaofei Wang, Ruizhi Li, Sri Harish Mallid, Takaaki Hori, Shinji Watanabe, Hynek Hermansky
Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 27 Sep 2018 • Sripad Krishna Devalla, Giridhar Subramanian, Tan Hung Pham, Xiaofei Wang, Shamira Perera, Tin A. Tun, Tin Aung, Leopold Schmetterer, Alexandre H. Thiery, Michael J. A. Girard
For all the ONH tissues, the mean CNR increased from $3. 50 \pm 0. 56$ (single-frame) to $7. 63 \pm 1. 81$ (denoised).
no code implementations • 19 Sep 2018 • Xiaofei Wang, Yiwen Han, Chenyang Wang, Qiyang Zhao, Xu Chen, Min Chen
In order to bring more intelligence to the edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge systems, for optimizing the mobile edge computing, caching and communication.
no code implementations • 13 May 2016 • Derek Feng, Xiaofei Wang
Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other.
no code implementations • 21 Sep 2015 • Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong liu, Xu Li, Qiang Fu, Yonghong Yan
This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3