Search Results for author: Wei Han

Found 46 papers, 13 papers with code

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

no code implementations16 May 2022 Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang

However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition.

Accented Speech Recognition

Unsupervised Data Selection via Discrete Speech Representation for ASR

no code implementations5 Apr 2022 Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR).

Automatic Speech Recognition Self-Supervised Learning

A Tensor-BTD-based Modulation for Massive Unsourced Random Access

no code implementations5 Dec 2021 Zhenting Luan, Yuchi Wu, Shansuo Liang, Liping Zhang, Wei Han, Bo Bai

In this letter, we propose a novel tensor-based modulation scheme for massive unsourced random access.

Tensor Decomposition

TAPL: Dynamic Part-based Visual Tracking via Attention-guided Part Localization

no code implementations25 Oct 2021 Wei Han, Hantao Huang, Xiaoxi Yu

Holistic object representation-based trackers suffer from performance drop under large appearance change such as deformation and occlusion.

Visual Tracking

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

no code implementations9 Oct 2021 Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech.

Multi-trends Enhanced Dynamic Micro-video Recommendation

no code implementations8 Oct 2021 Yujie Lu, Yingxuan Huang, Shengyu Zhang, Wei Han, Hui Chen, Zhou Zhao, Fei Wu

In this paper, we propose the DMR framework to explicitly model dynamic multi-trends of users' current preference and make predictions based on both the history and future potential trends.

Recommendation Systems

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis

2 code implementations EMNLP 2021 Wei Han, Hui Chen, Soujanya Poria

In this work, we propose a framework named MultiModal InfoMax (MMIM), which hierarchically maximizes the Mutual Information (MI) in unimodal input pairs (inter-modality) and between multimodal fusion result and unimodal input in order to maintain task-related information through multimodal fusion.

Multimodal Sentiment Analysis

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

no code implementations7 Aug 2021 Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.

 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Contrastive Learning Masked Language Modeling +2

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

2 code implementations28 Jul 2021 Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-Philippe Morency, Soujanya Poria

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data.

Multimodal Deep Learning Multimodal Sentiment Analysis

Supervised Contrastive Learning for Accented Speech Recognition

no code implementations2 Jul 2021 Tao Han, Hantao Huang, Ziang Yang, Wei Han

Neural network based speech recognition systems suffer from performance degradation due to accented speech, especially unfamiliar accents.

Accented Speech Recognition Contrastive Learning +1

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

no code implementations25 Apr 2021 Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

To improve streaming models, a recent study [1] proposed to distill a non-streaming teacher model on unsupervised utterances, and then train a streaming student using the teachers' predictions.

Automatic Speech Recognition

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

no code implementations6 Apr 2021 Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao

To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.

Automatic Speech Recognition

A Better and Faster End-to-End Model for Streaming ASR

no code implementations21 Nov 2020 Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.

Audio and Speech Processing Sound

Superconductor-metal quantum transition at the EuO-KTaO3 interface

no code implementations23 Oct 2020 Yang Ma, Jiasen Niu, Wenyu Xing, Yunyan Yao, Ranran Cai, Jirong Sun, X. C. Xie, Xi Lin, Wei Han

Superconductivity has been one of the most fascinating quantum states of matter for over several decades.

Superconductivity Mesoscale and Nanoscale Physics Materials Science

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

no code implementations22 Oct 2020 Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.

Automatic Speech Recognition

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

1 code implementation21 Oct 2020 Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

Automatic Speech Recognition Word Alignment

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

no code implementations20 Oct 2020 Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.

 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Automatic Speech Recognition

Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering

no code implementations17 Oct 2020 Hantao Huang, Tao Han, Wei Han, Deep Yap, Cheng-Ming Chiang

From the human perspective, to answer a visual question, one needs to read the question and then refer to the image to generate an answer.

Question Answering Visual Question Answering +1

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

no code implementations ICLR 2021 Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses.

Automatic Speech Recognition Knowledge Distillation

Dialogue Relation Extraction with Document-level Heterogeneous Graph Attention Networks

1 code implementation10 Sep 2020 Hui Chen, Pengfei Hong, Wei Han, Navonil Majumder, Soujanya Poria

This graph is fed to a graph attention network for context propagation among relevant nodes, which effectively captures the dialogue context.

Ranked #5 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Graph Attention +1

Improved Noisy Student Training for Automatic Speech Recognition

no code implementations19 May 2020 Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le

Noisy student training is an iterative self-training method that leverages augmentation to improve network performance.

Ranked #4 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Automatic Speech Recognition Image Classification

Conformer: Convolution-augmented Transformer for Speech Recognition

16 code implementations16 May 2020 Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Automatic Speech Recognition

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

4 code implementations7 May 2020 Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Automatic Speech Recognition

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations7 May 2020 Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition

Streaming Object Detection for 3-D Point Clouds

no code implementations ECCV 2020 Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen

This built-in data capture latency is artificial, and based on treating the point cloud as a camera image in order to leverage camera-inspired architectures.

Action Recognition Autonomous Vehicles +2

FFusionCGAN: An end-to-end fusion method for few-focus images using conditional GAN in cytopathological digital slides

1 code implementation3 Jan 2020 Xiebo Geng, Sibo Liua, Wei Han, Xu Li, Jiabo Ma, Jingya Yu, Xiuli Liu, Sahoqun Zeng, Li Chen, Shenghua Cheng

However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view.

Semantic Segmentation whole slide images

Statistical Inference in Mean-Field Variational Bayes

no code implementations4 Nov 2019 Wei Han, Yun Yang

We conduct non-asymptotic analysis on the mean-field variational inference for approximating posterior distributions in complex Bayesian models that may involve latent variables.

Variational Inference

StarNet: Targeted Computation for Object Detection in Point Clouds

no code implementations29 Aug 2019 Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, Vijay Vasudevan

We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics.

3D Object Detection Pedestrian Detection +1

A Strategy of MR Brain Tissue Images' Suggestive Annotation Based on Modified U-Net

no code implementations19 Jul 2018 Yang Deng, Yao Sun, Yongpei Zhu, Mingwang Zhu, Wei Han, Kehong Yuan

How to choose appropriate training dataset from limited labeled dataset rather than the whole also has great significance in saving training time.

Learning $3$D-FilterMap for Deep Convolutional Neural Networks

no code implementations5 Jan 2018 Yingzhen Yang, Jianchao Yang, Ning Xu, Wei Han

Due to the weight sharing scheme, the parameter size of the $3$D-FilterMap is much smaller than that of the filters to be learned in the conventional convolution layer when $3$D-FilterMap generates the same number of filters.

Dilated Recurrent Neural Networks

2 code implementations NeurIPS 2017 Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.

Sequential Image Classification

A Learning-Based Approach for Lane Departure Warning Systems with a Personalized Driver Model

no code implementations4 Feb 2017 Wenshuo Wang, Ding Zhao, Junqiang Xi, Wei Han

Second, based on this model, we develop an online model-based prediction algorithm to predict the forthcoming vehicle trajectory and judge whether the driver will demonstrate an LDB or a DCB.

Robust Single Image Super-Resolution via Deep Networks With Sparse Prior

1 code implementation journals 2016 Ding Liu, Zhaowen Wang, Bihan Wen, Student Member, Jianchao Yang, Member, Wei Han, and Thomas S. Huang, Fellow, IEEE

We demonstrate that a sparse coding model particularly designed for SR can be incarnated as a neural network with the merit of end-to-end optimization over training data.

Image Super-Resolution

Seq-NMS for Video Object Detection

1 code implementation26 Feb 2016 Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang

Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip.

Frame General Classification +3

Deep Networks for Image Super-Resolution with Sparse Prior

no code implementations ICCV 2015 Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, Thomas Huang

We show that a sparse coding model particularly designed for super-resolution can be incarnated as a neural network, and trained in a cascaded structure from end to end.

Image Restoration Image Super-Resolution

An Analysis of Unsupervised Pre-training in Light of Recent Advances

2 code implementations20 Dec 2014 Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang

We discover unsupervised pre-training, as expected, helps when the ratio of unsupervised to supervised samples is high, and surprisingly, hurts when the ratio is low.

Data Augmentation Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.