no code implementations • 19 Dec 2024 • Austin Stone, Hagen Soltau, Robert Geirhos, Xi Yi, Ye Xia, Bingyi Cao, KaiFeng Chen, Abhijit Ogale, Jonathon Shlens
One can observe this limitation in representations learned through captions or contrastive learning -- where the learned model treats an image essentially as a bag of words.
1 code implementation • 15 Aug 2024 • Robert Geirhos, Priyank Jaini, Austin Stone, Sourabh Medapati, Xi Yi, George Toderici, Abhijit Ogale, Jonathon Shlens
Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights.
no code implementations • 29 Apr 2024 • Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-Baptiste Alayrac, Neil Houlsby, Nenad Tomasev, Jan Freyberg, Charles Lau, Jonas Kemp, Jeremy Lai, Shekoofeh Azizi, Kimberly Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Siamak Shakeri, Luheng He, Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, Philip Mansfield, Jian Lu, Ehud Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, Jonathon Shlens, Ewa Dominowska, S. M. Ali Eslami, Katherine Chou, Claire Cui, Oriol Vinyals, Koray Kavukcuoglu, James Manyika, Jeff Dean, Demis Hassabis, Yossi Matias, Dale Webster, Joelle Barral, Greg Corrado, Christopher Semturs, S. Sara Mahdavi, Juraj Gottweis, Alan Karthikesalingam, Vivek Natarajan
We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin.
Ranked #1 on Question Answering on MedQA (using extra training data)
1 code implementation • 13 Jun 2023 • Wentao Wu, Aleksei Timofeev, Chen Chen, BoWen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang
Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.
no code implementations • 10 Apr 2023 • Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev
Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.
no code implementations • 30 Jan 2023 • Chen Chen, BoWen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang
We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.
no code implementations • 24 Oct 2022 • Zhaoqi Leng, Shuyang Cheng, Benjamin Caine, Weiyue Wang, Xiao Zhang, Jonathon Shlens, Mingxing Tan, Dragomir Anguelov
To alleviate the cost of hyperparameter tuning and iterative pseudo labeling, we develop a population-based data augmentation framework for 3D detection, named AutoPseudoAugment.
2 code implementations • ICCV 2023 • Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens
In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
no code implementations • ICLR 2022 • Jiquan Ngiam, Vijay Vasudevan, Benjamin Caine, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David J Weiss, Ben Sapp, Zhifeng Chen, Jonathon Shlens
In this work, we formulate a model for predicting the behavior of all agents jointly, producing consistent futures that account for interactions between agents.
no code implementations • NeurIPS 2021 • Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs
When incorporated into training, these soft calibration losses achieve state-of-the-art single-model ECE across multiple datasets with less than 1% decrease in accuracy.
4 code implementations • 15 Jun 2021 • Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David Weiss, Ben Sapp, Zhifeng Chen, Jonathon Shlens
In this work, we formulate a model for predicting the behavior of all agents jointly, producing consistent futures that account for interactions between agents.
no code implementations • 20 Apr 2021 • Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, Dragomir Anguelov
Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models.
7 code implementations • CVPR 2021 • Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens
Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50.
Ranked #226 on Image Classification on ImageNet
3 code implementations • NeurIPS 2021 • Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph
Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.
no code implementations • 2 Mar 2021 • Benjamin Caine, Rebecca Roelofs, Vijay Vasudevan, Jiquan Ngiam, Yuning Chai, Zhifeng Chen, Jonathon Shlens
To safely deploy autonomous vehicles, onboard perception systems must work reliably at high accuracy across a diverse set of environments and geographies.
4 code implementations • 1 Mar 2021 • Philipp Jund, Chris Sweeney, Nichola Abdo, Zhifeng Chen, Jonathon Shlens
In this work, we introduce a new large-scale dataset for scene flow estimation derived from corresponding tracked 3D objects, which is $\sim$1, 000$\times$ larger than previous real-world datasets in terms of the number of annotated frames.
Ranked #6 on Scene Flow Estimation on Argoverse 2
13 code implementations • CVPR 2021 • Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.
Ranked #54 on Instance Segmentation on COCO minival
no code implementations • ICCV 2021 • Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, Dragomir Anguelov
Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models.
1 code implementation • 15 Dec 2020 • Rebecca Roelofs, Nicholas Cain, Jonathon Shlens, Michael C. Mozer
We find that binning-based estimators with bins of equal mass (number of instances) have lower bias than estimators with bins of equal width.
1 code implementation • ECCV 2020 • Liang-Chieh Chen, Raphael Gontijo Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens
We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences and extra images to surpass state-of-the-art performance on core computer vision tasks.
no code implementations • ECCV 2020 • Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen
This built-in data capture latency is artificial, and based on treating the point cloud as a camera image in order to leverage camera-inspired architectures.
no code implementations • ECCV 2020 • Shuyang Cheng, Zhaoqi Leng, Ekin Dogus Cubuk, Barret Zoph, Chunyan Bai, Jiquan Ngiam, Yang song, Benjamin Caine, Vijay Vasudevan, Cong-Cong Li, Quoc V. Le, Jonathon Shlens, Dragomir Anguelov
Data augmentation has been widely adopted for object detection in 3D point clouds.
no code implementations • ICML 2020 • Gamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon Kornblith
Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias.
9 code implementations • CVPR 2020 • Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov
In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset.
18 code implementations • NeurIPS 2020 • Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le
Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size.
Ranked #12 on Data Augmentation on ImageNet
no code implementations • 29 Aug 2019 • Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, Vijay Vasudevan
We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics.
6 code implementations • ECCV 2020 • Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le
Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy.
Ranked #6 on Robust Object Detection on Cityscapes
no code implementations • NeurIPS 2019 • Dong Yin, Raphael Gontijo Lopes, Jonathon Shlens, Ekin D. Cubuk, Justin Gilmer
Achieving robustness to distributional shift is a longstanding and challenging goal of computer vision.
8 code implementations • NeurIPS 2019 • Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens
The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.
5 code implementations • 12 Jun 2019 • Aakanksha Chowdhery, Pete Warden, Jonathon Shlens, Andrew Howard, Rocky Rhodes
To facilitate the development of microcontroller friendly models, we present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models.
no code implementations • 8 Jun 2019 • Luke Metz, Niru Maheswaranathan, Jonathon Shlens, Jascha Sohl-Dickstein, Ekin D. Cubuk
State-of-the art vision models can achieve superhuman performance on image classification tasks when testing and training data come from the same distribution.
no code implementations • 22 Apr 2019 • Keren Gu, Brandon Yang, Jiquan Ngiam, Quoc Le, Jonathon Shlens
Compared to previous studies on adversarial examples and synthetic distortions, natural robustness captures a more diverse set of common image transformations that occur in the natural environment.
14 code implementations • ICCV 2019 • Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le
Convolutional networks have been the paradigm of choice in many computer vision applications.
Ranked #115 on Image Classification on CIFAR-100 (using extra training data)
2 code implementations • ICCV 2019 • Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens
Dramatic advances in generative models have resulted in near photographic quality for artificially rendered faces, animals and other objects in the natural world.
1 code implementation • 3 Mar 2019 • Jasmine Collins, Johannes Balle, Jonathon Shlens
We find that a standardization loss accelerates training on both small- and large-scale image classification experiments, works with a variety of architectures, and is largely robust to training across different batch sizes.
1 code implementation • NeurIPS 2018 • Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens
Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks.
Ranked #1 on Human Part Segmentation on PASCAL-Person-Part
no code implementations • CVPR 2019 • Simon Kornblith, Jonathon Shlens, Quoc V. Le
Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer.
3 code implementations • ECCV 2018 • Guangyu Robert Yang, Igor Ganichev, Xiao-Jing Wang, Jonathon Shlens, David Sussillo
COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory -- problems that remain challenging for modern deep learning architectures.
no code implementations • ICLR 2018 • Nishal P Shah, Sasidhar Madugula, EJ Chichilnisky, Yoram Singer, Jonathon Shlens
Retinal prostheses for treating incurable blindness are designed to electrically stimulate surviving retinal neurons, causing them to send artificial visual signals to the brain.
18 code implementations • ECCV 2018 • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.
Ranked #17 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)
no code implementations • 28 Nov 2017 • Lane McIntosh, Niru Maheswaranathan, David Sussillo, Jonathon Shlens
Importantly, the RNN may be deployed across a range of computational budgets by merely running the model for a variable number of iterations.
18 code implementations • CVPR 2018 • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le
In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture".
Ranked #6 on Classification on InDL
no code implementations • 19 May 2017 • Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy
Then, given the generated low-resolution color image and the original grayscale image as inputs, we train a second CNN to generate a high-resolution colorization of an image.
Ranked #3 on Colorization on ImageNet val
20 code implementations • 18 May 2017 • Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens
In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair.
1 code implementation • ICCV 2017 • Ryan Dahl, Mohammad Norouzi, Jonathon Shlens
A low resolution image may correspond to multiple plausible high resolution images, thus modeling the super resolution process with a pixel independent conditional model often results in averaging different details--hence blurry edges.
no code implementations • CVPR 2017 • Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, Vincent Vanhoucke
We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB).
36 code implementations • ICML 2017 • Augustus Odena, Christopher Olah, Jonathon Shlens
We expand on previous work for image quality assessment to provide two new analyses for assessing the discriminability and diversity of samples from class-conditional image synthesis models.
Ranked #14 on Conditional Image Generation on CIFAR-10 (Inception score metric)
12 code implementations • 24 Oct 2016 • Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings.
4 code implementations • 14 Mar 2016 • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.
112 code implementations • CVPR 2016 • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna
Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks.
Ranked #8 on Retinal OCT Disease Classification on OCT2017
29 code implementations • 18 Nov 2015 • Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey
In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution.
Ranked #7 on Unsupervised MNIST on MNIST
3 code implementations • 18 Nov 2015 • Tianqi Chen, Ian Goodfellow, Jonathon Shlens
Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network.
no code implementations • 23 Dec 2014 • Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, Jay Yagnik
Deep neural networks have been extremely successful at various image, speech, video recognition tasks because of their ability to model deep structures within the data.
60 code implementations • 20 Dec 2014 • Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy
Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence.
Ranked #44 on Image Classification on MNIST
3 code implementations • 11 Apr 2014 • Jonathon Shlens
This tutorial provides an introduction to ICA based on linear algebra formulating an intuition for ICA from first principles.
no code implementations • 8 Apr 2014 • Jonathon Shlens
Experimental neuroscience increasingly requires tractable models for analyzing and predicting the behavior of neurons and networks.
7 code implementations • 3 Apr 2014 • Jonathon Shlens
Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood.
2 code implementations • 19 Dec 2013 • Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean
In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.
Ranked #8 on Multi-label zero-shot learning on Open Images V4
no code implementations • 19 Dec 2013 • Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer
Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.
no code implementations • CVPR 2013 • Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of component parts.