no code implementations • ICML 2020 • Ting Chen, Lala Li, Yizhou Sun
Embedding layers are commonly used to map discrete symbols into continuous embedding vectors that reflect their semantic meanings.
no code implementations • ICCV 2023 • Simon Kornblith, Lala Li, ZiRui Wang, Thao Nguyen
We further explore the use of language models to guide the decoding process, obtaining small improvements over the Pareto frontier of reference-free vs. reference-based captioning metrics that arises from classifier-free guidance, and substantially improving the quality of captions generated from a model trained only on minimally curated web data.
1 code implementation • 22 May 2023 • Ting Chen, Lala Li
We employ two types of transformer layers: local layers operate on data tokens within each group, while global layers operate on a smaller set of introduced latent tokens.
1 code implementation • ICCV 2023 • Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.
1 code implementation • 15 Jun 2022 • Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton
Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.
4 code implementations • 23 May 2022 • Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
Ranked #17 on Text-to-Image Generation on MS COCO (using extra training data)
6 code implementations • ICLR 2022 • Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton
We present Pix2Seq, a simple and generic framework for object detection.
Ranked #77 on Object Detection on COCO minival (using extra training data)
3 code implementations • NeurIPS 2021 • Ting Chen, Calvin Luo, Lala Li
We construct datasets with explicit and controllable competing features, and show that, for contrastive learning, a few bits of easy-to-learn shared features can suppress, and even fully prevent, the learning of other sets of competing features.
no code implementations • WS 2019 • Lala Li, William Chan
The Insertion Transformer is well suited for long form text generation due to its parallel generation capabilities, requiring $O(\log_2 n)$ generation steps to generate $n$ tokens.
no code implementations • 25 Sep 2019 • Ting Chen, Lala Li, Yizhou Sun
Embedding layers are commonly used to map discrete symbols into continuous embedding vectors that reflect their semantic meanings.
2 code implementations • 26 Aug 2019 • Ting Chen, Lala Li, Yizhou Sun
Embedding layers are commonly used to map discrete symbols into continuous embedding vectors that reflect their semantic meanings.
1 code implementation • NeurIPS 2019 • Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse
Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns.