no code implementations • 14 Jan 2025 • Guizhe Jin, Zhuoren Li, Bo Leng, Wei Han, Lu Xiong, Chen Sun
To this end, we propose a Multi-objective Ensemble-Critic reinforcement learning method with Hybrid Parametrized Action for multi-objective compatible autonomous driving.
1 code implementation • 23 Dec 2024 • Shuaihang Chen, Yuanxing Liu, Wei Han, Weinan Zhang, Ting Liu
LLM-based Multi-Agent Systems ( LLM-MAS ) have become a research hotspot since the rise of large language models (LLMs).
no code implementations • 12 Dec 2024 • Wenrui Li, Zhe Yang, Wei Han, Hengyu Man, Xingtao Wang, Xiaopeng Fan
In this paper, we introduce hyperbolic space to 3D point cloud reconstruction, enabling the model to represent and understand complex hierarchical structures in point clouds with low distortion.
no code implementations • 10 Dec 2024 • Ellen Yi-Ge, Jiechao Gao, Wei Han, Wei Zhu
Second, we propose to re-rank the demonstrations retrieved by the embedding model via the LVLM's feedbacks, and calculate a list-wise ranking loss for training the embedding model.
1 code implementation • 25 Oct 2024 • Wei Han, Pan Zhou, Soujanya Poria, Shuicheng Yan
The limited context window of contemporary large language models (LLMs) remains a huge barrier to their broader application across various domains.
no code implementations • 26 Sep 2024 • Tianfang Xie, Tianjing Li, Wei Zhu, Wei Han, Yi Zhao
Due to their substantial sizes, large language models (LLMs) are typically deployed within a single-backbone multi-tenant framework.
1 code implementation • 1 Sep 2024 • Meng Qin, Chaorui Zhang, Yu Gao, Yibin Ding, Weipeng Jiang, Weixi Zhang, Wei Han, Bo Bai
Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks.
1 code implementation • 25 Aug 2024 • Wenrui Li, Wei Han, Yandu Chen, Yeyu Chai, Yidan Lu, Xingtao Wang, Xiaopeng Fan
However, as 3D point cloud data and text data often possess complex geometric structures in high-dimensional space, the proposed RLS employs a novel Riemann Attention Mechanism to reflect the intrinsic geometric relationships of the data.
no code implementations • 19 Aug 2024 • Jiaming Liu, Hongyuan Liu, Zhili Qin, Wei Han, Yulu Fan, Qinli Yang, Junming Shao
Consequently, this paper explores a more challenging problem of unsupervised class incremental learning (UCIL).
no code implementations • 16 Aug 2024 • Xingyuan Chen, Wenwei Kuang, Lei Deng, Wei Han, Bo Bai, Goncalo dos Reis
Specifically, we propose the row-column (RC) ansatz under the mean field point of view, which describes the measure structure of the weights in the neural network (NN) and admits a close measure dynamic.
1 code implementation • 10 Aug 2024 • Xiuyu Sun, Xiaohui Zhong, Xiaoze Xu, Yuanqing Huang, Hao Li, J. David Neelin, Deliang Chen, Jie Feng, Wei Han, Libo Wu, Yuan Qi
Weather forecasting traditionally relies on numerical weather prediction (NWP) systems that integrates global observational systems, data assimilation (DA), and forecasting models.
no code implementations • 18 Jun 2024 • Weizhi Fei, Xueyan Niu, Guoqing Xie, Yanhua Zhang, Bo Bai, Lei Deng, Wei Han
Current Large Language Models (LLMs) face inherent limitations due to their pre-defined context lengths, which impede their capacity for multi-hop reasoning within extensive textual contexts.
no code implementations • 14 May 2024 • Xueyan Niu, Bo Bai, Lei Deng, Wei Han
In particular, the energy function in modern continuous Hopfield networks serves as an explanation for the attention mechanism, which we approximate with a distance-based energy function.
1 code implementation • 12 Apr 2024 • Xiaoze Xu, Xiuyu Sun, Wei Han, Xiaohui Zhong, Lei Chen, Hao Li
Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance.
no code implementations • 26 Mar 2024 • Chang Liu, Wei Han, Guangyu Yan, Bowang Zhang, Chunlin Li
Parameter variations within the resonant network of wireless power transfer (WPT) systems can cause drift in the resonant frequency, leading to a detuned system that requires higher power capacity and experiences reduced transfer efficiency.
1 code implementation • 22 Feb 2024 • Wei Han, Hui Chen, Soujanya Poria
Fine-tuning large language models (LLMs) on multi-task instruction-following data has been proven to be a powerful learning paradigm for improving their zero-shot capabilities on new tasks.
no code implementations • 2 Feb 2024 • Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey
We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM.
1 code implementation • 15 Dec 2023 • Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han
Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses.
no code implementations • 30 Sep 2023 • Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu
We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models.
1 code implementation • 27 Sep 2023 • Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz
We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver.
no code implementations • 19 Sep 2023 • Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
2 code implementations • 9 Jul 2023 • Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria
Video question-answering is a fundamental task in the field of video understanding.
Ranked #10 on TGIF-Frame on TGIF-QA
no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank
AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.
no code implementations • 8 Jun 2023 • Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey
Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations.
no code implementations • 7 Jun 2023 • Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar
In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.
no code implementations • 30 May 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna
The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.
no code implementations • 23 May 2023 • Yew Ken Chia, Hui Chen, Wei Han, Guizhen Chen, Sharifah Mahani Aljunied, Soujanya Poria, Lidong Bing
Through comprehensive experiments involving multiple tasks, settings, and models, we demonstrate that CASE can serve as a general decoding strategy for complex sentiment tasks.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +4
1 code implementation • 3 Mar 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani
Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.
no code implementations • 2 Mar 2023 • Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 8 Feb 2023 • Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei Han
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.
Ranked #6 on Text-to-Music Generation on MusicCaps (FAD metric)
no code implementations • 3 Feb 2023 • Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays
The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data.
no code implementations • 16 Dec 2022 • Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda
The research on this topic is stymied by the lack of a public corpus.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 14 Nov 2022 • Wei Han, Yangqiming Wang, Christian Böhm, Junming Shao
The visualization of semantic vectors allows for a qualitative explanation of the neural network.
no code implementations • 29 Oct 2022 • Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani
We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model.
1 code implementation • 23 Oct 2022 • Hui Chen, Wei Han, Soujanya Poria
Self-training methods have been explored in recent years and have exhibited great performance in improving semi-supervised learning.
1 code implementation • 23 Oct 2022 • Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria
Existing multimodal tasks mostly target at the complete input modality setting, i. e., each modality is either complete or completely missing in both training and test sets.
1 code implementation • COLING 2022 • Hui Chen, Wei Han, Diyi Yang, Soujanya Poria
This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification.
1 code implementation • COLING 2022 • Wei Han, Hui Chen, Zhen Hai, Soujanya Poria, Lidong Bing
With the boom of e-commerce, Multimodal Review Helpfulness Prediction (MRHP), which aims to sort product reviews according to the predicted helpfulness scores has become a research hotspot.
2 code implementations • 22 Jun 2022 • Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.
Ranked #1 on Text-to-Image Generation on COCO
no code implementations • 16 May 2022 • Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang
However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition.
no code implementations • 5 Apr 2022 • Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani
Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 14 Dec 2021 • BoWen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha
We term this approach as Co-training Videos and Images for Action Recognition (CoVeR).
Ranked #6 on Action Classification on MiT (using extra training data)
no code implementations • 5 Dec 2021 • Zhenting Luan, Yuchi Wu, Shansuo Liang, Liping Zhang, Wei Han, Bo Bai
In this letter, we propose a novel tensor-based modulation scheme for massive unsourced random access.
no code implementations • 29 Nov 2021 • Zhenting Luan, Zhenyu Ming, Yuchi Wu, Wei Han, Xiang Chen, Bo Bai, Liping Zhang
We also develop a novel subcarrier recovery method for the proposed model.
no code implementations • 25 Oct 2021 • Wei Han, Hantao Huang, Xiaoxi Yu
Holistic object representation-based trackers suffer from performance drop under large appearance change such as deformation and occlusion.
no code implementations • 9 Oct 2021 • Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang
Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech.
no code implementations • 8 Oct 2021 • Yujie Lu, Yingxuan Huang, Shengyu Zhang, Wei Han, Hui Chen, Zhou Zhao, Fei Wu
In this paper, we propose the DMR framework to explicitly model dynamic multi-trends of users' current preference and make predictions based on both the history and future potential trends.
no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.
Ranked #1 on Speech Recognition on Common Voice
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
2 code implementations • EMNLP 2021 • Wei Han, Hui Chen, Soujanya Poria
In this work, we propose a framework named MultiModal InfoMax (MMIM), which hierarchically maximizes the Mutual Information (MI) in unimodal input pairs (inter-modality) and between multimodal fusion result and unimodal input in order to maintain task-related information through multimodal fusion.
Ranked #6 on Multimodal Sentiment Analysis on CMU-MOSI
4 code implementations • 7 Aug 2021 • Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu
In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.
Ranked #3 on Speech Recognition on LibriSpeech test-other
2 code implementations • 28 Jul 2021 • Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-Philippe Morency, Soujanya Poria
Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data.
no code implementations • 2 Jul 2021 • Tao Han, Hantao Huang, Ziang Yang, Wei Han
Neural network based speech recognition systems suffer from performance degradation due to accented speech, especially unfamiliar accents.
no code implementations • 25 Apr 2021 • Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao
To improve streaming models, a recent study [1] proposed to distill a non-streaming teacher model on unsupervised utterances, and then train a streaming student using the teachers' predictions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Apr 2021 • Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao
To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 21 Nov 2020 • Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu
To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.
Audio and Speech Processing Sound
no code implementations • 23 Oct 2020 • Yang Ma, Jiasen Niu, Wenyu Xing, Yunyan Yao, Ranran Cai, Jirong Sun, X. C. Xie, Xi Lin, Wei Han
Superconductivity has been one of the most fascinating quantum states of matter for over several decades.
Superconductivity Mesoscale and Nanoscale Physics Materials Science
no code implementations • 22 Oct 2020 • Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao
We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 21 Oct 2020 • Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang
FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 20 Oct 2020 • Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu
We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.
Ranked #4 on Speech Recognition on LibriSpeech test-clean (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 17 Oct 2020 • Hantao Huang, Tao Han, Wei Han, Deep Yap, Cheng-Ming Chiang
From the human perspective, to answer a visual question, one needs to read the question and then refer to the image to generate an answer.
no code implementations • ICLR 2021 • Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • COLING 2020 • Wei Han, Hantao Huang, Tao Han
Positional information of text is underused and there is a lack of evidence for the generated answer.
Optical Character Recognition Optical Character Recognition (OCR) +2
1 code implementation • 10 Sep 2020 • Hui Chen, Pengfei Hong, Wei Han, Navonil Majumder, Soujanya Poria
This graph is fed to a graph attention network for context propagation among relevant nodes, which effectively captures the dialogue context.
Ranked #7 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)
1 code implementation • 19 May 2020 • Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le
Noisy student training is an iterative self-training method that leverages augmentation to improve network performance.
Ranked #8 on Speech Recognition on LibriSpeech test-clean
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
25 code implementations • 16 May 2020 • Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).
Ranked #14 on Speech Recognition on LibriSpeech test-other (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu
On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
6 code implementations • 7 May 2020 • Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu
We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.
Ranked #17 on Speech Recognition on LibriSpeech test-other
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • ECCV 2020 • Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen
This built-in data capture latency is artificial, and based on treating the point cloud as a camera image in order to leverage camera-inspired architectures.
1 code implementation • 3 Jan 2020 • Xiebo Geng, Sibo Liua, Wei Han, Xu Li, Jiabo Ma, Jingya Yu, Xiuli Liu, Sahoqun Zeng, Li Chen, Shenghua Cheng
However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view.
9 code implementations • CVPR 2020 • Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov
In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset.
no code implementations • 6 Nov 2019 • Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu
In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 4 Nov 2019 • Wei Han, Yun Yang
We conduct non-asymptotic analysis on the mean-field variational inference for approximating posterior distributions in complex Bayesian models that may involve latent variables.
no code implementations • 29 Aug 2019 • Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, Vijay Vasudevan
We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics.
no code implementations • 19 Jul 2018 • Yang Deng, Yao Sun, Yongpei Zhu, Mingwang Zhu, Wei Han, Kehong Yuan
How to choose appropriate training dataset from limited labeled dataset rather than the whole also has great significance in saving training time.
1 code implementation • CVPR 2018 • Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, Thomas S. Huang
Advances in image super-resolution (SR) have recently benefited significantly from rapid developments in deep neural networks.
Ranked #48 on Image Super-Resolution on BSD100 - 4x upscaling
no code implementations • 5 Jan 2018 • Yingzhen Yang, Jianchao Yang, Ning Xu, Wei Han
Due to the weight sharing scheme, the parameter size of the $3$D-FilterMap is much smaller than that of the filters to be learned in the conventional convolution layer when $3$D-FilterMap generates the same number of filters.
2 code implementations • NeurIPS 2017 • Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang
To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.
Ranked #24 on Sequential Image Classification on Sequential MNIST
no code implementations • 4 Feb 2017 • Wenshuo Wang, Ding Zhao, Junqiang Xi, Wei Han
Second, based on this model, we develop an online model-based prediction algorithm to predict the forthcoming vehicle trajectory and judge whether the driver will demonstrate an LDB or a DCB.
1 code implementation • journals 2016 • Ding Liu, Zhaowen Wang, Bihan Wen, Student Member, Jianchao Yang, Member, Wei Han, and Thomas S. Huang, Fellow, IEEE
We demonstrate that a sparse coding model particularly designed for SR can be incarnated as a neural network with the merit of end-to-end optimization over training data.
1 code implementation • 26 Feb 2016 • Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang
Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip.
no code implementations • ICCV 2015 • Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, Thomas Huang
We show that a sparse coding model particularly designed for super-resolution can be incarnated as a neural network, and trained in a cascaded structure from end to end.
no code implementations • CVPR 2015 • Vignesh Ramanathan, Cong-Cong Li, Jia Deng, Wei Han, Zhen Li, Kunlong Gu, Yang song, Samy Bengio, Charles Rosenberg, Li Fei-Fei
Human actions capture a wide variety of interactions between people and objects.
no code implementations • 22 Apr 2015 • Zhangyang Wang, Yingzhen Yang, Zhaowen Wang, Shiyu Chang, Wei Han, Jianchao Yang, Thomas S. Huang
Deep learning has been successfully applied to image super resolution (SR).
2 code implementations • 20 Dec 2014 • Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang
We discover unsupervised pre-training, as expected, helps when the ratio of unsupervised to supervised samples is high, and surprisingly, hurts when the ratio is low.
Ranked #91 on Image Classification on STL-10