Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

1 code implementation12 Jul 2024 Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

In this paper, we propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task, which contains a local branch to generate object-level detection result and a global branch to obtain scene-level global feature.

Collaborative Inference Language Modelling +3

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

no code implementations26 Jun 2024 Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility.

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

1 code implementation11 Jun 2024 Huahui Yi, Xiaofei Wang, Kang Li, Chao Li

Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels.

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

no code implementations9 Jun 2024 Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda

Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements.

Denoising Speech Denoising +2

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

no code implementations6 Jun 2024 Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Jinyu Li, Sheng Zhao, Naoyuki Kanda

We also show that the proposed MaskGIT-based model can generate phoneme durations with higher quality and diversity compared to its regression or flow-matching counterparts.


Domain Game: Disentangle Anatomical Feature for Single Domain Generalized Segmentation

no code implementations4 Jun 2024 Hao Chen, Hongrun Zhang, U Wang Chan, Rui Yin, Xiaofei Wang, Chao Li

Single domain generalization aims to address the challenge of out-of-distribution generalization problem with only one source domain available.

Brain Tumor Segmentation Disentanglement +5

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

no code implementations28 May 2024 Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation.

Machine Translation speech-recognition +3

Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems

no code implementations20 Apr 2024 Xiaofei Wang, Yunfeng Zhao, Chao Qiu, QinGhua Hu, Victor C. M. Leung

In response to these issues, this paper introduces socialized learning (SL) as a promising solution, further propelling the advancement of EI.


Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

no code implementations19 Apr 2024 Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li

However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression.


CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

no code implementations10 Apr 2024 Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation.

Dialogue Generation

Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection

no code implementations12 Mar 2024 Jiahui Fu, Chen Gao, Zitian Wang, Lirong Yang, Xiaofei Wang, Beipeng Mu, Si Liu

Recent 3D object detectors typically utilize multi-sensor data and unify multi-modal features in the shared bird's-eye view (BEV) representation space.

3D Object Detection object-detection

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

no code implementations12 Feb 2024 Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression.

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

no code implementations16 Jan 2024 Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics.

Automatic Speech Recognition Benchmarking +4

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code implementations14 Jan 2024 Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Audio Generation Language Modelling

Convolutional Neural Networks for Segmentation of Malignant Pleural Mesothelioma: Analysis of Probability Map Thresholds (CALGB 30901, Alliance)

no code implementations30 Nov 2023 Mena Shenouda, Eyjólfur Gudmundsson, Feng Li, Christopher M. Straus, Hedy L. Kindler, Arkadiusz Z. Dudek, Thomas Stinchcombe, Xiaofei Wang, Adam Starkey, Samuel G. Armato III

Percent difference of tumor volume and overlap using the Dice Similarity Coefficient (DSC) were compared between the standard reference provided by the radiologist and CNN outputs for thresholds ranging from 0. 001 to 0. 9.

Computed Tomography (CT)

Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion

1 code implementation25 Oct 2023 Fan Yang, Xiaofei Wang

To address this issue, we proposed a method for extending the spatio-temporal behavior dataset in Student Classroom Scenarios (SCB-ST-Dataset4) through image dataset.

DiariST: Streaming Speech Translation with Speaker Diarization

1 code implementation14 Sep 2023 Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

speaker-diarization Speaker Diarization +3

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations14 Aug 2023 Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking

no code implementations21 Jun 2023 Shaohui Mei, Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Lap-Pui Chau

Surprisingly, there has been a lack of comprehensive studies on the robustness of RS tasks, prompting us to undertake a thorough survey and benchmark on the robustness of image classification and object detection in RS.

Adversarial Robustness Benchmarking +3

One for All: Unified Workload Prediction for Dynamic Multi-tenant Edge Cloud Platforms

1 code implementation2 Jun 2023 Shaoyuan Huang, Zheng Wang, Heng Zhang, Xiaofei Wang, Cheng Zhang, Wenyu Wang

In this paper, we propose an end-to-end framework with global pooling and static content awareness, DynEformer, to provide a unified workload prediction scheme for dynamic MT-ECP.

Time Series Time Series Prediction

When Computing Power Network Meets Distributed Machine Learning: An Efficient Federated Split Learning Framework

no code implementations22 May 2023 Xinjing Yuan, Lingjun Pu, Lei Jiao, Xiaofei Wang, Meijuan Yang, Jingdong Xu

In this paper, we advocate CPN-FedSL, a novel and flexible Federated Split Learning (FedSL) framework over Computing Power Network (CPN).


Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion

1 code implementation13 May 2023 Fan Yang, Tao Wang, Xiaofei Wang

We constructed a dataset, which contained 11, 248 labels and 4, 001 images, with an emphasis on the common behavior of raising hands in a classroom setting (Student Classroom Behavior dataset, SCB-Dataset).

Multi-task Learning of Histology and Molecular Markers for Classifying Diffuse Glioma

no code implementations26 Mar 2023 Xiaofei Wang, Stephen Price, Chao Li

This paper presents a first attempt to jointly predict molecular markers and histology features and model their interactions for classifying diffuse glioma bases on whole slide images.

Multi-Task Learning whole slide images

CBA: Contextual Background Attack against Optical Aerial Detection in the Physical World

1 code implementation27 Feb 2023 Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei

To further strengthen the attack performance, the adversarial patches are forced to be outside targets during training, by which the detected objects of interest, both on and outside patches, benefit the accumulation of attack efficacy.

Adversarial Robustness

Contextual adversarial attack against aerial detection in the physical world

no code implementations27 Feb 2023 Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei

We propose an innovative contextual attack method against aerial detection in real scenarios, which achieves powerful attack performance and transfers well between various aerial object detectors without smearing or blocking the interested objects to hide.

Adversarial Attack Blocking

When Quantum Information Technologies Meet Blockchain in Web 3.0

no code implementations29 Nov 2022 Minrui Xu, Xiaoxu Ren, Dusit Niyato, Jiawen Kang, Chao Qiu, Zehui Xiong, Xiaofei Wang, Victor C. M. Leung

Therefore, in this paper, we introduce a quantum blockchain-driven Web 3. 0 framework that provides information-theoretic security for decentralized data transferring and payment transactions.

Cloud Computing

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

no code implementations11 Nov 2022 Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka

Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speech separation with large-scale self-supervised learning

no code implementations9 Nov 2022 Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez

Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.

Self-Supervised Learning Speech Separation

Simulating realistic speech overlaps improves multi-talker ASR

no code implementations27 Oct 2022 Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

no code implementations12 Sep 2022 Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka

To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DLMP of Competitive Markets in Active Distribution Networks: Models, Solutions, Applications, and Visions

no code implementations27 May 2022 Xiaofei Wang, Fangxing Li, Linquan Bai, Xin Fang

The DLMP provides a solution that can be essential for competitive market operation in future distribution systems.

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

no code implementations7 Apr 2022 Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka

In this paper, we propose a three-stage training scheme for the CSS model that can leverage both supervised data and extra large-scale unsupervised real-world conversational data.

Speech Separation

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

1 code implementation30 Mar 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

1 code implementation2 Feb 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

no code implementations24 Jan 2022 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang

Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions.

speech-recognition Speech Recognition

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

no code implementations18 Oct 2021 Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang

Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.

Speech Enhancement speech-recognition +1

All-neural beamformer for continuous speech separation

no code implementations13 Oct 2021 Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations12 Oct 2021 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

no code implementations7 Oct 2021 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.

Action Detection Activity Detection +6

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

no code implementations11 Aug 2021 Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

no code implementations6 Jul 2021 Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Saliency-Guided Image Translation

no code implementations CVPR 2021 Lai Jiang, Mai Xu, Xiaofei Wang, Leonid Sigal

In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map.

Generative Adversarial Network Image-to-Image Translation +1

End-to-End Speaker-Attributed ASR with Transformer

no code implementations5 Apr 2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

no code implementations31 Mar 2021 Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Continuous Speech Separation with Ad Hoc Microphone Arrays

no code implementations3 Mar 2021 Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

speech-recognition Speech Recognition +1

Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

no code implementations17 Jan 2021 Yiwen Han, Shihao Shen, Xiaofei Wang, Shiqiang Wang, Victor C. M. Leung

In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing.


EC-SAGINs: Edge Computing-enhanced Space-Air-Ground Integrated Networks for Internet of Vehicles

no code implementations15 Jan 2021 Shuai Yu, Xiaowen Gong, Qian Shi, Xiaofei Wang, Xu Chen

After discussing several existing orbital and aerial edge computing architectures, we propose a framework of edge computing-enabled space-air-ground integrated networks (EC-SAGINs) to support various IoV services for the vehicles in remote areas.

Edge-computing Imitation Learning +1

A Systematic Review of the Efforts and Hindrances of Modeling and Simulation of CAR T-cell Therapy

no code implementations13 Jan 2021 Ujwani Nukala, Marisabel Rodriguez Messan, Osman N. Yogurtcu, Xiaofei Wang, Hong Yang

Chimeric Antigen Receptor (CAR) T-cell therapy is an immunotherapy that has recently become highly instrumental in the fight against life-threatening diseases.

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

1 code implementation3 Nov 2020 Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

1 code implementation11 Aug 2020 Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

no code implementations19 Jun 2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka

We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Federated Learning for 6G Communications: Challenges, Methods, and Future Directions

no code implementations4 Jun 2020 Yi Liu, Xingliang Yuan, Zehui Xiong, Jiawen Kang, Xiaofei Wang, Dusit Niyato

As the 5G communication networks are being widely deployed worldwide, both industry and academia have started to move beyond 5G and explore 6G communications.

Federated Learning

Global Adaptive Generative Adjustment

no code implementations2 Nov 2019 Bin Wang, Xiaofei Wang, Jianhua Guo

Many traditional signal recovery approaches can behave well basing on the penalized likelihood.

Computational Efficiency Model Selection

A practical two-stage training strategy for multi-stream end-to-end speech recognition

no code implementations23 Oct 2019 Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

DeshadowGAN: A Deep Learning Approach to Remove Shadows from Optical Coherence Tomography Images

no code implementations7 Oct 2019 Haris Cheong, Sripad Krishna Devalla, Tan Hung Pham, Zhang Liang, Tin Aung Tun, Xiaofei Wang, Shamira Perera, Leopold Schmetterer, Aung Tin, Craig Boote, Alexandre H. Thiery, Michael J. A. Girard

Image quality was assessed qualitatively (for artifacts) and quantitatively using the intralayer contrast: a measure of shadow visibility ranging from 0 (shadow-free) to 1 (strong shadow) and compared to compensated images.

Denoising Generative Adversarial Network +3

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

1 code implementation19 Jul 2019 Xiaofei Wang, Yiwen Han, Victor C. M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen

Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network.

Cloud Computing Edge-computing +2

Multi-Stream End-to-End Speech Recognition

no code implementations17 Jun 2019 Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

no code implementations19 Apr 2019 Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita

This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Attention Based Glaucoma Detection: A Large-scale Database and CNN Model

1 code implementation CVPR 2019 Liu Li, Mai Xu, Xiaofei Wang, Lai Jiang, Hanruo Liu

The attention maps of the ophthalmologists are also collected in LAG database through a simulated eye-tracking experiment.

In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning

no code implementations19 Sep 2018 Xiaofei Wang, Yiwen Han, Chenyang Wang, Qiyang Zhao, Xu Chen, Min Chen

In order to bring more intelligence to the edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge systems, for optimizing the mobile edge computing, caching and communication.

Edge-computing Federated Learning

ABtree: An Algorithm for Subgroup-Based Treatment Assignment

no code implementations13 May 2016 Derek Feng, Xiaofei Wang

Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other.


Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

no code implementations21 Sep 2015 Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong liu, Xu Li, Qiang Fu, Yonghong Yan

This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

