Search Results for author: Youlong Cheng

Found 10 papers, 7 papers with code

Mesh-TensorFlow: Deep Learning for Supercomputers

1 code implementation • NeurIPS 2018 • Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

We use Mesh-TensorFlow to implement an efficient data-parallel, model-parallel version of the Transformer sequence-to-sequence model.

Ranked #10 on Language Modelling on One Billion Word

Language Modelling

1,554

Paper
Code

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

13 code implementations • NeurIPS 2019 • Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks.

Ranked #4 on Fine-Grained Image Classification on Birdsnap (using extra training data)

Fine-Grained Image Classification Machine Translation +1

2,779

Paper
Code

Image Classification at Supercomputer Scale

no code implementations • 16 Nov 2018 • Chris Ying, Sameer Kumar, Dehao Chen, Tao Wang, Youlong Cheng

Deep learning is extremely computationally intensive, and hardware vendors have responded by building faster accelerators in large clusters.

Classification General Classification +1

Paper
Add Code

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

2,779

Paper
Code

High Resolution Medical Image Analysis with Spatial Partitioning

1 code implementation • 6 Sep 2019 • Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song

It is infeasible to train CNN models directly on such high resolution images, because neural activations of a single image do not fit in the memory of a single GPU/TPU, and naive data and model parallelism approaches do not work.

Vocal Bursts Intensity Prediction

1,554

Paper
Code

Talking-Heads Attention

4 code implementations • 5 Mar 2020 • Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou

We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

Language Modelling Masked Language Modeling +2

76,589

Paper
Code

Toward Annotator Group Bias in Crowdsourcing

no code implementations • ACL 2022 • Haochen Liu, Joseph Thekinen, Sinem Mollaoglu, Da Tang, Ji Yang, Youlong Cheng, Hui Liu, Jiliang Tang

We conduct experiments on both synthetic and real-world datasets.

Paper
Add Code

Enhanced Exploration in Neural Feature Selection for Deep Click-Through Rate Prediction Models via Ensemble of Gating Layers

no code implementations • 7 Dec 2021 • Lin Guan, Xia Xiao, Ming Chen, Youlong Cheng

Inspired by gradient-based neural architecture search (NAS) and network pruning methods, people have tackled the NFS problem with Gating approach that inserts a set of differentiable binary gates to drop less informative features.

Click-Through Rate Prediction Ensemble Learning +3

Paper
Add Code

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

1 code implementation • 13 Apr 2022 • Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qin, Youlong Cheng, Yang You

Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks.

Click-Through Rate Prediction

155

Paper
Code

Monolith: Real Time Recommendation System With Collisionless Embedding Table

1 code implementation • 16 Sep 2022 • Zhuoran Liu, Leqi Zou, Xuan Zou, Caihua Wang, Biao Zhang, Da Tang, Bolin Zhu, Yijie Zhu, Peng Wu, Ke Wang, Youlong Cheng

In this paper, we present Monolith, a system tailored for online training.

796

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.