Search Results for author: Jiahui Yu

Found 49 papers, 24 papers with code

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

no code implementations27 Nov 2023 Chenglin Yang, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu

To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs.

Language Modelling Text Retrieval +1

Towards an Automatic AI Agent for Reaction Condition Recommendation in Chemical Synthesis

no code implementations16 Nov 2023 Kexin Chen, Junyou Li, Kunyi Wang, Yuyang Du, Jiahui Yu, Jiamin Lu, Lanqing Li, Jiezhong Qiu, Qun Fang, Pheng Ann Heng, Guangyong Chen

Artificial intelligence (AI) for reaction condition optimization has become an important topic in the pharmaceutical industry, given that a data-driven AI model can assist drug discovery and accelerate reaction design.

Contrastive Learning Drug Discovery +2

PaLM 2 Technical Report

1 code implementation17 May 2023 Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

 Ranked #1 on Question Answering on TriviaQA (using extra training data)

Language Modelling Question Answering

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

no code implementations31 Mar 2023 Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers.

VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining

1 code implementation CVPR 2023 Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, Feng Yang

Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset, and it has powerful zero-shot capability for aesthetic tasks such as zero-shot style classification and zero-shot IAA, surpassing many supervised baselines.

Language Modelling

Deep object detection for waterbird monitoring using aerial imagery

1 code implementation10 Oct 2022 Krish Kabra, Alexander Xiong, Wenbin Li, Minxuan Luo, William Lu, Raul Garcia, Dhananjay Vijay, Jiahui Yu, Maojie Tang, Tianjiao Yu, Hank Arnold, Anna Vallery, Richard Gibbons, Arko Barman

In this work, we present a deep learning pipeline that can be used to precisely detect, count, and monitor waterbirds using aerial imagery collected by a commercial drone.

Management Object Detection

Normalization effects on deep neural networks

1 code implementation2 Sep 2022 Jiahui Yu, Konstantinos Spiliopoulos

A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{\gamma_{i}}$ with $\gamma_{i}\in[1/2, 1]$ and we study the effect of the choice of the $\gamma_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set.

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

2 code implementations22 Jun 2022 Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Machine Translation World Knowledge

CoCa: Contrastive Captioners are Image-Text Foundation Models

3 code implementations4 May 2022 Jiahui Yu, ZiRui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

Action Classification Image Captioning +9

Self-supervised Learning with Random-projection Quantizer for Speech Recognition

3 code implementations3 Feb 2022 Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu

In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook.

Self-Supervised Learning speech-recognition +1

Combined Scaling for Zero-shot Transfer Learning

no code implementations19 Nov 2021 Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.

Classification Contrastive Learning +3

Vector-quantized Image Modeling with Improved VQGAN

3 code implementations ICLR 2022 Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.

Image Generation Representation Learning +1

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

2 code implementations ICLR 2022 ZiRui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.

Image Captioning Language Modelling +2

A Better and Faster End-to-End Model for Streaming ASR

no code implementations21 Nov 2020 Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.

Audio and Speech Processing Sound

Normalization effects on shallow neural networks and related asymptotic expansions

1 code implementation20 Nov 2020 Jiahui Yu, Konstantinos Spiliopoulos

In addition, we show that to leading order in $N$, the variance of the neural network's statistical output decays as the implied normalization by the scaling parameter approaches the mean field normalization.

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

1 code implementation21 Oct 2020 Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

no code implementations ICLR 2021 Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications

no code implementations6 Aug 2020 Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, Arun Mallya

The generative adversarial network (GAN) framework has emerged as a powerful tool for various image and video synthesis tasks, allowing the synthesis of visual content in an unconditional or input-conditional manner.

Neural Rendering Translation

Cross-Supervised Object Detection

no code implementations26 Jun 2020 Zitian Chen, Zhiqiang Shen, Jiahui Yu, Erik Learned-Miller

After learning a new object category from image-level annotations (with no object bounding boxes), humans are remarkably good at precisely localizing those objects.

object-detection Object Detection

Neural Sparse Representation for Image Restoration

1 code implementation NeurIPS 2020 Yuchen Fan, Jiahui Yu, Yiqun Mei, Yulun Zhang, Yun Fu, Ding Liu, Thomas S. Huang

Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks.

Image Compression Image Denoising +2

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

no code implementations16 May 2020 Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang

In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Conformer: Convolution-augmented Transformer for Speech Recognition

23 code implementations16 May 2020 Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

6 code implementations7 May 2020 Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Pyramid Attention Networks for Image Restoration

2 code implementations28 Apr 2020 Yiqun Mei, Yuchen Fan, Yulun Zhang, Jiahui Yu, Yuqian Zhou, Ding Liu, Yun Fu, Thomas S. Huang, Humphrey Shi

Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales.

Demosaicking Image Denoising +1

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

1 code implementation ECCV 2020 Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le

Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs.

Neural Architecture Search

Scale-wise Convolution for Image Restoration

1 code implementation19 Dec 2019 Yuchen Fan, Jiahui Yu, Ding Liu, Thomas S. Huang

In this paper, we show that properly modeling scale-invariance into neural networks can bring significant benefits to image restoration performance.

Data Augmentation Image Compression +3

Scaling Up Neural Architecture Search with Big Single-Stage Models

no code implementations25 Sep 2019 Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Quoc Le

In this work, we propose BigNAS, an approach that simplifies this workflow and scales up neural architecture search to target a wide range of model sizes simultaneously.

Neural Architecture Search

Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement

no code implementations22 Aug 2019 Zhiqiang Shen, Zhankui He, Wanyun Cui, Jiahui Yu, Yutong Zheng, Chenchen Zhu, Marios Savvides

In order to distill diverse knowledge from different trained (teacher) models, we propose to use adversarial-based learning strategy where we define a block-wise training loss to guide and optimize the predefined student network to recover the knowledge in teacher models, and to promote the discriminator network to distinguish teacher vs. student features simultaneously.

Knowledge Distillation

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

9 code implementations ICLR 2020 Jiahui Yu, Thomas Huang

Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74. 2% top-1 accuracy, 2. 4% better than default MobileNet-v2 (301M FLOPs), and even 0. 2% better than RL-searched MNasNet (317M FLOPs).

Neural Architecture Search

Universally Slimmable Networks and Improved Training Techniques

1 code implementation ICCV 2019 Jiahui Yu, Thomas Huang

We also evaluate the proposed US-Nets and improved training techniques on tasks of image super-resolution and deep reinforcement learning.

Image Super-Resolution

FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary

no code implementations ICLR 2020 Yingzhen Yang, Jiahui Yu, Nebojsa Jojic, Jun Huan, Thomas S. Huang

FSNet has the same architecture as that of the baseline CNN to be compressed, and each convolution layer of FSNet has the same number of filters from FS as that of the basline CNN in the forward process.

General Classification Image Classification +5

An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity

no code implementations3 Feb 2019 Yingzhen Yang, Jiahui Yu, Xingjian Li, Jun Huan, Thomas S. Huang

In this paper, we investigate the role of Rademacher complexity in improving generalization of DNNs and propose a novel regularizer rooted in Local Rademacher Complexity (LRC).

Neural Architecture Search

Foreground-aware Image Inpainting

no code implementations CVPR 2019 Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo

We show that by such disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting.

Disentanglement Image Inpainting

Slimmable Neural Networks

3 code implementations ICLR 2019 Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang

Instead of training individual networks with different width configurations, we train a shared network with switchable batch normalization.

Instance Segmentation Keypoint Detection +3

A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization

no code implementations23 Nov 2018 Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively.

General Classification Image Classification +6

Generative Image Inpainting with Contextual Attention

28 code implementations CVPR 2018 Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang

Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.

Image Inpainting Test

Improving Object Detection from Scratch via Gated Feature Reuse

2 code implementations4 Dec 2017 Zhiqiang Shen, Honghui Shi, Jiahui Yu, Hai Phan, Rogerio Feris, Liangliang Cao, Ding Liu, Xinchao Wang, Thomas Huang, Marios Savvides

In this paper, we present a simple and parameter-efficient drop-in module for one-stage object detectors like SSD when learning from scratch (i. e., without pre-trained models).

object-detection Object Detection

UnitBox: An Advanced Object Detection Network

no code implementations4 Aug 2016 Jiahui Yu, Yuning Jiang, Zhangyang Wang, Zhimin Cao, Thomas Huang

In present object detection systems, the deep convolutional neural networks (CNNs) are utilized to predict bounding boxes of object candidates, and have gained performance advantages over the traditional region proposal methods.

Face Detection object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.