Search Results for author: Wei Han

Found 72 papers, 24 papers with code

Fuxi-DA: A Generalized Deep Learning Data Assimilation Framework for Assimilating Satellite Observations

no code implementations • 12 Apr 2024 • Xiaoze Xu, Xiuyu Sun, Wei Han, Xiaohui Zhong, Lei Chen, Hao Li

Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance.

Weather Forecasting

Paper
Add Code

Ultrafast Adaptive Primary Frequency Tuning and Secondary Frequency Identification for S/S WPT system

no code implementations • 26 Mar 2024 • Chang Liu, Wei Han, Guangyu Yan, Bowang Zhang, Chunlin Li

The swift response of SCC and two-step perturb-and-observe algorithm mitigate output disturbances, thereby expediting the frequency tuning process.

Paper
Add Code

INSTRAUG: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning

1 code implementation • 22 Feb 2024 • Wei Han, Hui Chen, Soujanya Poria

Fine-tuning large language models (LLMs) on multi-task instruction-following data has been proven to be a powerful learning paradigm for improving their zero-shot capabilities on new tasks.

Instruction Following

Paper
Code

Retrieval Augmented End-to-End Spoken Dialog Models

no code implementations • 2 Feb 2024 • Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM.

dialog state tracking In-Context Learning +3

Paper
Add Code

Extending Context Window of Large Language Models via Semantic Compression

no code implementations • 15 Dec 2023 • Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses.

Few-Shot Learning Information Retrieval +3

Paper
Add Code

SLM: Bridge the thin gap between speech and text foundation models

no code implementations • 30 Sep 2023 • Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models.

Instruction Following Language Modelling +3

Paper
Add Code

High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

no code implementations • 27 Sep 2023 • Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz

We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver.

Denoising

Paper
Add Code

Multimodal Modeling For Spoken Language Identification

no code implementations • 19 Sep 2023 • Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.

Language Identification Spoken language identification

Paper
Add Code

Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models

2 code implementations • 9 Jul 2023 • Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria

Video question-answering is a fundamental task in the field of video understanding.

Question Answering Video Question Answering +2

Paper
Code

AudioPaLM: A Large Language Model That Can Speak and Listen

no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank

AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.

Language Modelling Large Language Model +5

Paper
Add Code

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

no code implementations • 8 Jun 2023 • Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations.

dialog state tracking Language Modelling +1

Paper
Add Code

Label Aware Speech Representation Learning For Language Identification

no code implementations • 7 Jun 2023 • Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.

Language Identification Missing Labels +3

Paper
Add Code

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations • 30 May 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Paper
Add Code

Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

no code implementations • 23 May 2023 • Yew Ken Chia, Hui Chen, Wei Han, Guizhen Chen, Sharifah Mahani Aljunied, Soujanya Poria, Lidong Bing

Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Add Code

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

1 code implementation • 3 Mar 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

Paper
Code

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

no code implementations • 2 Mar 2023 • Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Noise2Music: Text-conditioned Music Generation with Diffusion Models

no code implementations • 8 Feb 2023 • Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei Han

We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.

Ranked #2 on Text-to-Music Generation on MusicCaps

Music Generation Text-to-Music Generation

Paper
Add Code

Efficient Domain Adaptation for Speech Foundation Models

no code implementations • 3 Feb 2023 • Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data.

Domain Adaptation speech-recognition +2

Paper
Add Code

Speech Aware Dialog System Technology Challenge (DSTC11)

no code implementations • 16 Dec 2022 • Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda

The research on this topic is stymied by the lack of a public corpus.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

An Interpretable Neuron Embedding for Static Knowledge Distillation

no code implementations • 14 Nov 2022 • Wei Han, Yangqiming Wang, Christian Böhm, Junming Shao

The visualization of semantic vectors allows for a qualitative explanation of the neural network.

Knowledge Distillation

Paper
Add Code

Accelerating RNN-T Training and Inference Using CTC guidance

no code implementations • 29 Oct 2022 • Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani

We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model.

Paper
Add Code

SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training

1 code implementation • 23 Oct 2022 • Hui Chen, Wei Han, Soujanya Poria

Self-training methods have been explored in recent years and have exhibited great performance in improving semi-supervised learning.

Pseudo Label Semi-Supervised Text Classification

Paper
Code

MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences

1 code implementation • 23 Oct 2022 • Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria

Existing multimodal tasks mostly target at the complete input modality setting, i. e., each modality is either complete or completely missing in both training and test sets.

Denoising Imputation

Paper
Code

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

1 code implementation • COLING 2022 • Hui Chen, Wei Han, Diyi Yang, Soujanya Poria

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification.

Sentence Text Augmentation +2

Paper
Code

SANCL: Multimodal Review Helpfulness Prediction with Selective Attention and Natural Contrastive Learning

1 code implementation • COLING 2022 • Wei Han, Hui Chen, Zhen Hai, Soujanya Poria, Lidong Bing

With the boom of e-commerce, Multimodal Review Helpfulness Prediction (MRHP), which aims to sort product reviews according to the predicted helpfulness scores has become a research hotspot.

Contrastive Learning

Paper
Code

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

2 code implementations • 22 Jun 2022 • Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Ranked #1 on Text-to-Image Generation on LAION COCO

Machine Translation Text-to-Image Generation +1

505

Paper
Code

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

no code implementations • 16 May 2022 • Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang

However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition.

Accented Speech Recognition Benchmarking +1

Paper
Add Code

Unsupervised Data Selection via Discrete Speech Representation for ASR

no code implementations • 5 Apr 2022 • Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Co-training Transformer with Videos and Images Improves Action Recognition

no code implementations • 14 Dec 2021 • BoWen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha

We term this approach as Co-training Videos and Images for Action Recognition (CoVeR).

Ranked #8 on Action Classification on MiT (using extra training data)

Action Classification Action Recognition In Videos +2

Paper
Add Code

A Tensor-BTD-based Modulation for Massive Unsourced Random Access

no code implementations • 5 Dec 2021 • Zhenting Luan, Yuchi Wu, Shansuo Liang, Liping Zhang, Wei Han, Bo Bai

In this letter, we propose a novel tensor-based modulation scheme for massive unsourced random access.

Tensor Decomposition

Paper
Add Code

Harmonic Retrieval with $L_1$-Tucker Tensor Decomposition

no code implementations • 29 Nov 2021 • Zhenting Luan, Zhenyu Ming, Yuchi Wu, Wei Han, Xiang Chen, Bo Bai, Liping Zhang

We also develop a novel subcarrier recovery method for the proposed model.

Retrieval Tensor Decomposition

Paper
Add Code

TAPL: Dynamic Part-based Visual Tracking via Attention-guided Part Localization

no code implementations • 25 Oct 2021 • Wei Han, Hantao Huang, Xiaoxi Yu

Holistic object representation-based trackers suffer from performance drop under large appearance change such as deformation and occlusion.

Object Visual Tracking

Paper
Add Code

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

no code implementations • 9 Oct 2021 • Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech.

Paper
Add Code

Multi-trends Enhanced Dynamic Micro-video Recommendation

no code implementations • 8 Oct 2021 • Yujie Lu, Yingxuan Huang, Shengyu Zhang, Wei Han, Hui Chen, Zhou Zhao, Fei Wu

In this paper, we propose the DMR framework to explicitly model dynamic multi-trends of users' current preference and make predictions based on both the history and future potential trends.

Recommendation Systems

Paper
Add Code

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.

Ranked #1 on Speech Recognition on Common Voice

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis

2 code implementations • EMNLP 2021 • Wei Han, Hui Chen, Soujanya Poria

In this work, we propose a framework named MultiModal InfoMax (MMIM), which hierarchically maximizes the Mutual Information (MI) in unimodal input pairs (inter-modality) and between multimodal fusion result and unimodal input in order to maintain task-related information through multimodal fusion.

Ranked #5 on Multimodal Sentiment Analysis on CMU-MOSI

Multimodal Sentiment Analysis

640

Paper
Code

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

3 code implementations • 7 Aug 2021 • Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.

Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Contrastive Learning Language Modelling +3

29,172

Paper
Code

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

2 code implementations • 28 Jul 2021 • Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-Philippe Morency, Soujanya Poria

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data.

Multimodal Deep Learning Multimodal Sentiment Analysis

640

Paper
Code

Supervised Contrastive Learning for Accented Speech Recognition

no code implementations • 2 Jul 2021 • Tao Han, Hantao Huang, Ziang Yang, Wei Han

Neural network based speech recognition systems suffer from performance degradation due to accented speech, especially unfamiliar accents.

Accented Speech Recognition Contrastive Learning +3

Paper
Add Code

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

no code implementations • 25 Apr 2021 • Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

To improve streaming models, a recent study [1] proposed to distill a non-streaming teacher model on unsupervised utterances, and then train a streaming student using the teachers' predictions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

no code implementations • 6 Apr 2021 • Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao

To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Better and Faster End-to-End Model for Streaming ASR

no code implementations • 21 Nov 2020 • Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.

Audio and Speech Processing Sound

Paper
Add Code

Superconductor-metal quantum transition at the EuO-KTaO3 interface

no code implementations • 23 Oct 2020 • Yang Ma, Jiasen Niu, Wenyu Xing, Yunyan Yao, Ranran Cai, Jirong Sun, X. C. Xie, Xi Lin, Wei Han

Superconductivity has been one of the most fascinating quantum states of matter for over several decades.

Superconductivity Mesoscale and Nanoscale Physics Materials Science

Paper
Add Code

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

no code implementations • 22 Oct 2020 • Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

1 code implementation • 21 Oct 2020 • Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

1 code implementation • 20 Oct 2020 • Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.

Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering

no code implementations • 17 Oct 2020 • Hantao Huang, Tao Han, Wei Han, Deep Yap, Cheng-Ming Chiang

From the human perspective, to answer a visual question, one needs to read the question and then refer to the image to generate an answer.

Question Answering Visual Question Answering

Paper
Add Code

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

no code implementations • ICLR 2021 • Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering

no code implementations • COLING 2020 • Wei Han, Hantao Huang, Tao Han

Positional information of text is underused and there is a lack of evidence for the generated answer.

Optical Character Recognition Optical Character Recognition (OCR) +2

Paper
Add Code

Dialogue Relation Extraction with Document-level Heterogeneous Graph Attention Networks

1 code implementation • 10 Sep 2020 • Hui Chen, Pengfei Hong, Wei Han, Navonil Majumder, Soujanya Poria

This graph is fed to a graph attention network for context propagation among relevant nodes, which effectively captures the dialogue context.

Ranked #7 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Graph Attention +2

Paper
Code

Improved Noisy Student Training for Automatic Speech Recognition

1 code implementation • 19 May 2020 • Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le

Noisy student training is an iterative self-training method that leverages augmentation to improve network performance.

Ranked #5 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Conformer: Convolution-augmented Transformer for Speech Recognition

24 code implementations • 16 May 2020 • Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Ranked #12 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

10,069

Paper
Code

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

6 code implementations • 7 May 2020 • Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Ranked #12 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

900

Paper
Code

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming Object Detection for 3-D Point Clouds

no code implementations • ECCV 2020 • Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen

This built-in data capture latency is artificial, and based on treating the point cloud as a camera image in order to leverage camera-inspired architectures.

Action Recognition Autonomous Vehicles +4

Paper
Add Code

FFusionCGAN: An end-to-end fusion method for few-focus images using conditional GAN in cytopathological digital slides

1 code implementation • 3 Jan 2020 • Xiebo Geng, Sibo Liua, Wei Han, Xu Li, Jiabo Ma, Jingya Yu, Xiuli Liu, Sahoqun Zeng, Li Chen, Shenghua Cheng

However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view.

Generative Adversarial Network Semantic Segmentation +1

Paper
Code

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

8 code implementations • CVPR 2020 • Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov

In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset.

Autonomous Driving

4,766

Paper
Code

A comparison of end-to-end models for long-form speech recognition

no code implementations • 6 Nov 2019 • Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu

In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Statistical Inference in Mean-Field Variational Bayes

no code implementations • 4 Nov 2019 • Wei Han, Yun Yang

We conduct non-asymptotic analysis on the mean-field variational inference for approximating posterior distributions in complex Bayesian models that may involve latent variables.

Variational Inference

Paper
Add Code

StarNet: Targeted Computation for Object Detection in Point Clouds

no code implementations • 29 Aug 2019 • Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, Vijay Vasudevan

We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics.

3D Object Detection Object +3

Paper
Add Code

A Strategy of MR Brain Tissue Images' Suggestive Annotation Based on Modified U-Net

no code implementations • 19 Jul 2018 • Yang Deng, Yao Sun, Yongpei Zhu, Mingwang Zhu, Wei Han, Kehong Yuan

How to choose appropriate training dataset from limited labeled dataset rather than the whole also has great significance in saving training time.

Segmentation

Paper
Add Code

Image Super-Resolution via Dual-State Recurrent Networks

1 code implementation • CVPR 2018 • Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, Thomas S. Huang

Advances in image super-resolution (SR) have recently benefited significantly from rapid developments in deep neural networks.

Ranked #42 on Image Super-Resolution on BSD100 - 4x upscaling

Image Super-Resolution

Paper
Code

Learning $3$D-FilterMap for Deep Convolutional Neural Networks

no code implementations • 5 Jan 2018 • Yingzhen Yang, Jianchao Yang, Ning Xu, Wei Han

Due to the weight sharing scheme, the parameter size of the $3$D-FilterMap is much smaller than that of the filters to be learned in the conventional convolution layer when $3$D-FilterMap generates the same number of filters.

Paper
Add Code

Dilated Recurrent Neural Networks

2 code implementations • NeurIPS 2017 • Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.

Ranked #24 on Sequential Image Classification on Sequential MNIST

Sequential Image Classification

342

Paper
Code

A Learning-Based Approach for Lane Departure Warning Systems with a Personalized Driver Model

no code implementations • 4 Feb 2017 • Wenshuo Wang, Ding Zhao, Junqiang Xi, Wei Han

Second, based on this model, we develop an online model-based prediction algorithm to predict the forthcoming vehicle trajectory and judge whether the driver will demonstrate an LDB or a DCB.

Paper
Add Code

Robust Single Image Super-Resolution via Deep Networks With Sparse Prior

1 code implementation • journals 2016 • Ding Liu, Zhaowen Wang, Bihan Wen, Student Member, Jianchao Yang, Member, Wei Han, and Thomas S. Huang, Fellow, IEEE

We demonstrate that a sparse coding model particularly designed for SR can be incarnated as a neural network with the merit of end-to-end optimization over training data.

Image Super-Resolution

Paper
Code

Seq-NMS for Video Object Detection

1 code implementation • 26 Feb 2016 • Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang

Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip.

General Classification Object +4

Paper
Code

Deep Networks for Image Super-Resolution with Sparse Prior

no code implementations • ICCV 2015 • Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, Thomas Huang

We show that a sparse coding model particularly designed for super-resolution can be incarnated as a neural network, and trained in a cascaded structure from end to end.

Image Restoration Image Super-Resolution

Paper
Add Code

Learning Semantic Relationships for Better Action Retrieval in Images

no code implementations • CVPR 2015 • Vignesh Ramanathan, Cong-Cong Li, Jia Deng, Wei Han, Zhen Li, Kunlong Gu, Yang song, Samy Bengio, Charles Rosenberg, Li Fei-Fei

Human actions capture a wide variety of interactions between people and objects.

Retrieval

Paper
Add Code

Self-Tuned Deep Super Resolution

no code implementations • 22 Apr 2015 • Zhangyang Wang, Yingzhen Yang, Zhaowen Wang, Shiyu Chang, Wei Han, Jianchao Yang, Thomas S. Huang

Deep learning has been successfully applied to image super resolution (SR).

Denoising Image Super-Resolution

Paper
Add Code

An Analysis of Unsupervised Pre-training in Light of Recent Advances

2 code implementations • 20 Dec 2014 • Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang

We discover unsupervised pre-training, as expected, helps when the ratio of unsupervised to supervised samples is high, and surprisingly, hurts when the ratio is low.

Ranked #92 on Image Classification on STL-10

Data Augmentation Image Classification +2

Paper
Code

Attribute based Chinese Named Entity Recognition and Disambiguation

no code implementations • WS 2012 • Wei Han, Guang Liu, Yuzhao Mao, Zhenni Huang

Attribute Chinese Named Entity Recognition +5

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.