Search Results for author: Wei Han

Found 72 papers, 24 papers with code

Fuxi-DA: A Generalized Deep Learning Data Assimilation Framework for Assimilating Satellite Observations

no code implementations12 Apr 2024 Xiaoze Xu, Xiuyu Sun, Wei Han, Xiaohui Zhong, Lei Chen, Hao Li

Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance.

Weather Forecasting

Ultrafast Adaptive Primary Frequency Tuning and Secondary Frequency Identification for S/S WPT system

no code implementations26 Mar 2024 Chang Liu, Wei Han, Guangyu Yan, Bowang Zhang, Chunlin Li

The swift response of SCC and two-step perturb-and-observe algorithm mitigate output disturbances, thereby expediting the frequency tuning process.

INSTRAUG: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning

1 code implementation22 Feb 2024 Wei Han, Hui Chen, Soujanya Poria

Fine-tuning large language models (LLMs) on multi-task instruction-following data has been proven to be a powerful learning paradigm for improving their zero-shot capabilities on new tasks.

Instruction Following

Retrieval Augmented End-to-End Spoken Dialog Models

no code implementations2 Feb 2024 Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM.

dialog state tracking In-Context Learning +3

Extending Context Window of Large Language Models via Semantic Compression

no code implementations15 Dec 2023 Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses.

Few-Shot Learning Information Retrieval +3

High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

no code implementations27 Sep 2023 Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz

We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver.

Denoising

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

no code implementations8 Jun 2023 Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations.

dialog state tracking Language Modelling +1

Label Aware Speech Representation Learning For Language Identification

no code implementations7 Jun 2023 Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.

Language Identification Missing Labels +3

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations30 May 2023 Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

no code implementations23 May 2023 Yew Ken Chia, Hui Chen, Wei Han, Guizhen Chen, Sharifah Mahani Aljunied, Soujanya Poria, Lidong Bing

Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

1 code implementation3 Mar 2023 Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

An Interpretable Neuron Embedding for Static Knowledge Distillation

no code implementations14 Nov 2022 Wei Han, Yangqiming Wang, Christian Böhm, Junming Shao

The visualization of semantic vectors allows for a qualitative explanation of the neural network.

Knowledge Distillation

Accelerating RNN-T Training and Inference Using CTC guidance

no code implementations29 Oct 2022 Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani

We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model.

SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training

1 code implementation23 Oct 2022 Hui Chen, Wei Han, Soujanya Poria

Self-training methods have been explored in recent years and have exhibited great performance in improving semi-supervised learning.

Pseudo Label Semi-Supervised Text Classification

MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences

1 code implementation23 Oct 2022 Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria

Existing multimodal tasks mostly target at the complete input modality setting, i. e., each modality is either complete or completely missing in both training and test sets.

Denoising Imputation

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

1 code implementation COLING 2022 Hui Chen, Wei Han, Diyi Yang, Soujanya Poria

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification.

Sentence Text Augmentation +2

SANCL: Multimodal Review Helpfulness Prediction with Selective Attention and Natural Contrastive Learning

1 code implementation COLING 2022 Wei Han, Hui Chen, Zhen Hai, Soujanya Poria, Lidong Bing

With the boom of e-commerce, Multimodal Review Helpfulness Prediction (MRHP), which aims to sort product reviews according to the predicted helpfulness scores has become a research hotspot.

Contrastive Learning

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

2 code implementations22 Jun 2022 Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Machine Translation Text-to-Image Generation +1

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

no code implementations16 May 2022 Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang

However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition.

Accented Speech Recognition Benchmarking +1

A Tensor-BTD-based Modulation for Massive Unsourced Random Access

no code implementations5 Dec 2021 Zhenting Luan, Yuchi Wu, Shansuo Liang, Liping Zhang, Wei Han, Bo Bai

In this letter, we propose a novel tensor-based modulation scheme for massive unsourced random access.

Tensor Decomposition

TAPL: Dynamic Part-based Visual Tracking via Attention-guided Part Localization

no code implementations25 Oct 2021 Wei Han, Hantao Huang, Xiaoxi Yu

Holistic object representation-based trackers suffer from performance drop under large appearance change such as deformation and occlusion.

Object Visual Tracking

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

no code implementations9 Oct 2021 Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech.

Multi-trends Enhanced Dynamic Micro-video Recommendation

no code implementations8 Oct 2021 Yujie Lu, Yingxuan Huang, Shengyu Zhang, Wei Han, Hui Chen, Zhou Zhao, Fei Wu

In this paper, we propose the DMR framework to explicitly model dynamic multi-trends of users' current preference and make predictions based on both the history and future potential trends.

Recommendation Systems

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis

2 code implementations EMNLP 2021 Wei Han, Hui Chen, Soujanya Poria

In this work, we propose a framework named MultiModal InfoMax (MMIM), which hierarchically maximizes the Mutual Information (MI) in unimodal input pairs (inter-modality) and between multimodal fusion result and unimodal input in order to maintain task-related information through multimodal fusion.

Multimodal Sentiment Analysis

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

3 code implementations7 Aug 2021 Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.

 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Contrastive Learning Language Modelling +3

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

2 code implementations28 Jul 2021 Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-Philippe Morency, Soujanya Poria

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data.

Multimodal Deep Learning Multimodal Sentiment Analysis

Supervised Contrastive Learning for Accented Speech Recognition

no code implementations2 Jul 2021 Tao Han, Hantao Huang, Ziang Yang, Wei Han

Neural network based speech recognition systems suffer from performance degradation due to accented speech, especially unfamiliar accents.

Accented Speech Recognition Contrastive Learning +3

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

no code implementations25 Apr 2021 Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

To improve streaming models, a recent study [1] proposed to distill a non-streaming teacher model on unsupervised utterances, and then train a streaming student using the teachers' predictions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

no code implementations6 Apr 2021 Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao

To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Better and Faster End-to-End Model for Streaming ASR

no code implementations21 Nov 2020 Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.

Audio and Speech Processing Sound

Superconductor-metal quantum transition at the EuO-KTaO3 interface

no code implementations23 Oct 2020 Yang Ma, Jiasen Niu, Wenyu Xing, Yunyan Yao, Ranran Cai, Jirong Sun, X. C. Xie, Xi Lin, Wei Han

Superconductivity has been one of the most fascinating quantum states of matter for over several decades.

Superconductivity Mesoscale and Nanoscale Physics Materials Science

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

no code implementations22 Oct 2020 Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

1 code implementation21 Oct 2020 Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

1 code implementation20 Oct 2020 Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.

 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering

no code implementations17 Oct 2020 Hantao Huang, Tao Han, Wei Han, Deep Yap, Cheng-Ming Chiang

From the human perspective, to answer a visual question, one needs to read the question and then refer to the image to generate an answer.

Question Answering Visual Question Answering

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

no code implementations ICLR 2021 Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Dialogue Relation Extraction with Document-level Heterogeneous Graph Attention Networks

1 code implementation10 Sep 2020 Hui Chen, Pengfei Hong, Wei Han, Navonil Majumder, Soujanya Poria

This graph is fed to a graph attention network for context propagation among relevant nodes, which effectively captures the dialogue context.

Ranked #7 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Graph Attention +2

Conformer: Convolution-augmented Transformer for Speech Recognition

24 code implementations16 May 2020 Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Ranked #12 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

6 code implementations7 May 2020 Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations7 May 2020 Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Streaming Object Detection for 3-D Point Clouds

no code implementations ECCV 2020 Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen

This built-in data capture latency is artificial, and based on treating the point cloud as a camera image in order to leverage camera-inspired architectures.

Action Recognition Autonomous Vehicles +4

FFusionCGAN: An end-to-end fusion method for few-focus images using conditional GAN in cytopathological digital slides

1 code implementation3 Jan 2020 Xiebo Geng, Sibo Liua, Wei Han, Xu Li, Jiabo Ma, Jingya Yu, Xiuli Liu, Sahoqun Zeng, Li Chen, Shenghua Cheng

However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view.

Generative Adversarial Network Semantic Segmentation +1

Statistical Inference in Mean-Field Variational Bayes

no code implementations4 Nov 2019 Wei Han, Yun Yang

We conduct non-asymptotic analysis on the mean-field variational inference for approximating posterior distributions in complex Bayesian models that may involve latent variables.

Variational Inference

StarNet: Targeted Computation for Object Detection in Point Clouds

no code implementations29 Aug 2019 Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, Vijay Vasudevan

We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics.

3D Object Detection Object +3

A Strategy of MR Brain Tissue Images' Suggestive Annotation Based on Modified U-Net

no code implementations19 Jul 2018 Yang Deng, Yao Sun, Yongpei Zhu, Mingwang Zhu, Wei Han, Kehong Yuan

How to choose appropriate training dataset from limited labeled dataset rather than the whole also has great significance in saving training time.

Segmentation

Image Super-Resolution via Dual-State Recurrent Networks

1 code implementation CVPR 2018 Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, Thomas S. Huang

Advances in image super-resolution (SR) have recently benefited significantly from rapid developments in deep neural networks.

Image Super-Resolution

Learning $3$D-FilterMap for Deep Convolutional Neural Networks

no code implementations5 Jan 2018 Yingzhen Yang, Jianchao Yang, Ning Xu, Wei Han

Due to the weight sharing scheme, the parameter size of the $3$D-FilterMap is much smaller than that of the filters to be learned in the conventional convolution layer when $3$D-FilterMap generates the same number of filters.

Dilated Recurrent Neural Networks

2 code implementations NeurIPS 2017 Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.

Sequential Image Classification

A Learning-Based Approach for Lane Departure Warning Systems with a Personalized Driver Model

no code implementations4 Feb 2017 Wenshuo Wang, Ding Zhao, Junqiang Xi, Wei Han

Second, based on this model, we develop an online model-based prediction algorithm to predict the forthcoming vehicle trajectory and judge whether the driver will demonstrate an LDB or a DCB.

Robust Single Image Super-Resolution via Deep Networks With Sparse Prior

1 code implementation journals 2016 Ding Liu, Zhaowen Wang, Bihan Wen, Student Member, Jianchao Yang, Member, Wei Han, and Thomas S. Huang, Fellow, IEEE

We demonstrate that a sparse coding model particularly designed for SR can be incarnated as a neural network with the merit of end-to-end optimization over training data.

Image Super-Resolution

Seq-NMS for Video Object Detection

1 code implementation26 Feb 2016 Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang

Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip.

General Classification Object +4

Deep Networks for Image Super-Resolution with Sparse Prior

no code implementations ICCV 2015 Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, Thomas Huang

We show that a sparse coding model particularly designed for super-resolution can be incarnated as a neural network, and trained in a cascaded structure from end to end.

Image Restoration Image Super-Resolution

An Analysis of Unsupervised Pre-training in Light of Recent Advances

2 code implementations20 Dec 2014 Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang

We discover unsupervised pre-training, as expected, helps when the ratio of unsupervised to supervised samples is high, and surprisingly, hurts when the ratio is low.

Data Augmentation Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.