Search Results for author: Jiahui Yu

Found 35 papers, 15 papers with code

CoCa: Contrastive Captioners are Image-Text Foundation Models

no code implementations4 May 2022 Jiahui Yu, ZiRui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

Action Classification Image Captioning +9

Self-supervised Learning with Random-projection Quantizer for Speech Recognition

no code implementations3 Feb 2022 Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu

In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook.

Self-Supervised Learning Speech Recognition

Combined Scaling for Open-Vocabulary Image Classification

no code implementations19 Nov 2021 Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.

Ranked #2 on Zero-Shot Transfer Image Classification on ImageNet (using extra training data)

Classification Contrastive Learning +3

Vector-quantized Image Modeling with Improved VQGAN

1 code implementation ICLR 2022 Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.

Image Generation Representation Learning +1

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

no code implementations ICLR 2022 ZiRui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.

Image Captioning Language Modelling +3

A Better and Faster End-to-End Model for Streaming ASR

no code implementations21 Nov 2020 Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.

Audio and Speech Processing Sound

Normalization effects on shallow neural networks and related asymptotic expansions

no code implementations20 Nov 2020 Jiahui Yu, Konstantinos Spiliopoulos

In addition, we show that to leading order in $N$, the variance of the neural network's statistical output decays as the implied normalization by the scaling parameter approaches the mean field normalization.

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

1 code implementation21 Oct 2020 Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

Automatic Speech Recognition Word Alignment

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

no code implementations ICLR 2021 Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses.

Automatic Speech Recognition Knowledge Distillation

Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications

no code implementations6 Aug 2020 Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, Arun Mallya

The generative adversarial network (GAN) framework has emerged as a powerful tool for various image and video synthesis tasks, allowing the synthesis of visual content in an unconditional or input-conditional manner.

Neural Rendering Translation

Cross-Supervised Object Detection

no code implementations26 Jun 2020 Zitian Chen, Zhiqiang Shen, Jiahui Yu, Erik Learned-Miller

After learning a new object category from image-level annotations (with no object bounding boxes), humans are remarkably good at precisely localizing those objects.

Object Detection

Neural Sparse Representation for Image Restoration

1 code implementation NeurIPS 2020 Yuchen Fan, Jiahui Yu, Yiqun Mei, Yulun Zhang, Yun Fu, Ding Liu, Thomas S. Huang

Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks.

Image Compression Image Denoising +2

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

no code implementations16 May 2020 Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang

In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints.

Automatic Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

17 code implementations16 May 2020 Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Automatic Speech Recognition

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

4 code implementations7 May 2020 Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Automatic Speech Recognition

Pyramid Attention Networks for Image Restoration

2 code implementations28 Apr 2020 Yiqun Mei, Yuchen Fan, Yulun Zhang, Jiahui Yu, Yuqian Zhou, Ding Liu, Yun Fu, Thomas S. Huang, Humphrey Shi

Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales.

Demosaicking Image Denoising +1

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

1 code implementation ECCV 2020 Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le

Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs.

Neural Architecture Search

Scale-wise Convolution for Image Restoration

1 code implementation19 Dec 2019 Yuchen Fan, Jiahui Yu, Ding Liu, Thomas S. Huang

In this paper, we show that properly modeling scale-invariance into neural networks can bring significant benefits to image restoration performance.

Data Augmentation Image Compression +3

Scaling Up Neural Architecture Search with Big Single-Stage Models

no code implementations25 Sep 2019 Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Quoc Le

In this work, we propose BigNAS, an approach that simplifies this workflow and scales up neural architecture search to target a wide range of model sizes simultaneously.

Neural Architecture Search

Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement

no code implementations22 Aug 2019 Zhiqiang Shen, Zhankui He, Wanyun Cui, Jiahui Yu, Yutong Zheng, Chenchen Zhu, Marios Savvides

In order to distill diverse knowledge from different trained (teacher) models, we propose to use adversarial-based learning strategy where we define a block-wise training loss to guide and optimize the predefined student network to recover the knowledge in teacher models, and to promote the discriminator network to distinguish teacher vs. student features simultaneously.

Knowledge Distillation

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

3 code implementations ICLR 2020 Jiahui Yu, Thomas Huang

Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74. 2% top-1 accuracy, 2. 4% better than default MobileNet-v2 (301M FLOPs), and even 0. 2% better than RL-searched MNasNet (317M FLOPs).

Neural Architecture Search

Universally Slimmable Networks and Improved Training Techniques

1 code implementation ICCV 2019 Jiahui Yu, Thomas Huang

We also evaluate the proposed US-Nets and improved training techniques on tasks of image super-resolution and deep reinforcement learning.

Image Super-Resolution

FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary

no code implementations ICLR 2020 Yingzhen Yang, Jiahui Yu, Nebojsa Jojic, Jun Huan, Thomas S. Huang

FSNet has the same architecture as that of the baseline CNN to be compressed, and each convolution layer of FSNet has the same number of filters from FS as that of the basline CNN in the forward process.

General Classification Image Classification +3

An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity

no code implementations3 Feb 2019 Yingzhen Yang, Jiahui Yu, Xingjian Li, Jun Huan, Thomas S. Huang

In this paper, we investigate the role of Rademacher complexity in improving generalization of DNNs and propose a novel regularizer rooted in Local Rademacher Complexity (LRC).

Neural Architecture Search

Foreground-aware Image Inpainting

no code implementations CVPR 2019 Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo

We show that by such disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting.

Disentanglement Image Inpainting

Slimmable Neural Networks

3 code implementations ICLR 2019 Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang

Instead of training individual networks with different width configurations, we train a shared network with switchable batch normalization.

Instance Segmentation Keypoint Detection +2

A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization

no code implementations23 Nov 2018 Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively.

General Classification Image Classification +5

Free-Form Image Inpainting with Gated Convolution

31 code implementations ICCV 2019 Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang

We present a generative image inpainting system to complete images with free-form mask and guidance.

Image Inpainting

Generative Image Inpainting with Contextual Attention

28 code implementations CVPR 2018 Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang

Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.

Image Inpainting

Improving Object Detection from Scratch via Gated Feature Reuse

2 code implementations4 Dec 2017 Zhiqiang Shen, Honghui Shi, Jiahui Yu, Hai Phan, Rogerio Feris, Liangliang Cao, Ding Liu, Xinchao Wang, Thomas Huang, Marios Savvides

In this paper, we present a simple and parameter-efficient drop-in module for one-stage object detectors like SSD when learning from scratch (i. e., without pre-trained models).

Object Detection

UnitBox: An Advanced Object Detection Network

no code implementations4 Aug 2016 Jiahui Yu, Yuning Jiang, Zhangyang Wang, Zhimin Cao, Thomas Huang

In present object detection systems, the deep convolutional neural networks (CNNs) are utilized to predict bounding boxes of object candidates, and have gained performance advantages over the traditional region proposal methods.

Face Detection Object Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.