Search Results for author: Jonathon Shlens

Found 57 papers, 33 papers with code

MOFI: Learning Image Representations from Noisy Entity Annotated Images

1 code implementation13 Jun 2023 Wentao Wu, Aleksei Timofeev, Chen Chen, BoWen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.

Image Classification Image Retrieval +3

On Robustness in Multimodal Learning

no code implementations10 Apr 2023 Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.

Representation Learning

PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds

no code implementations24 Oct 2022 Zhaoqi Leng, Shuyang Cheng, Benjamin Caine, Weiyue Wang, Xiao Zhang, Jonathon Shlens, Mingxing Tan, Dragomir Anguelov

To alleviate the cost of hyperparameter tuning and iterative pseudo labeling, we develop a population-based data augmentation framework for 3D detection, named AutoPseudoAugment.

Data Augmentation Pseudo Label

Soft Calibration Objectives for Neural Networks

no code implementations NeurIPS 2021 Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs

When incorporated into training, these soft calibration losses achieve state-of-the-art single-model ECE across multiple datasets with less than 1% decrease in accuracy.

Decision Making

Scaling Local Self-Attention for Parameter Efficient Visual Backbones

7 code implementations CVPR 2021 Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens

Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50.

Image Classification Instance Segmentation +4

Revisiting ResNets: Improved Training and Scaling Strategies

3 code implementations NeurIPS 2021 Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

Action Classification Document Image Classification +2

Pseudo-labeling for Scalable 3D Object Detection

no code implementations2 Mar 2021 Benjamin Caine, Rebecca Roelofs, Vijay Vasudevan, Jiquan Ngiam, Yuning Chai, Zhifeng Chen, Jonathon Shlens

To safely deploy autonomous vehicles, onboard perception systems must work reliably at high accuracy across a diverse set of environments and geographies.

3D Object Detection Autonomous Vehicles +5

Scalable Scene Flow from Point Clouds in the Real World

4 code implementations1 Mar 2021 Philipp Jund, Chris Sweeney, Nichola Abdo, Zhifeng Chen, Jonathon Shlens

In this work, we introduce a new large-scale dataset for scene flow estimation derived from corresponding tracked 3D objects, which is $\sim$1, 000$\times$ larger than previous real-world datasets in terms of the number of annotated frames.

Autonomous Vehicles Motion Estimation +1

Bottleneck Transformers for Visual Recognition

13 code implementations CVPR 2021 Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.

Image Classification Instance Segmentation +3

Mitigating Bias in Calibration Error Estimation

1 code implementation15 Dec 2020 Rebecca Roelofs, Nicholas Cain, Jonathon Shlens, Michael C. Mozer

We find that binning-based estimators with bins of equal mass (number of instances) have lower bias than estimators with bins of equal width.

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

1 code implementation ECCV 2020 Liang-Chieh Chen, Raphael Gontijo Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens

We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences and extra images to surpass state-of-the-art performance on core computer vision tasks.

Image Segmentation Optical Flow Estimation +4

Streaming Object Detection for 3-D Point Clouds

no code implementations ECCV 2020 Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen

This built-in data capture latency is artificial, and based on treating the point cloud as a camera image in order to leverage camera-inspired architectures.

Action Recognition Autonomous Vehicles +4

Revisiting Spatial Invariance with Low-Rank Local Connectivity

no code implementations ICML 2020 Gamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon Kornblith

Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias.

Inductive Bias

RandAugment: Practical automated data augmentation with a reduced search space

16 code implementations NeurIPS 2020 Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le

Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size.

Data Augmentation Domain Generalization +3

StarNet: Targeted Computation for Object Detection in Point Clouds

no code implementations29 Aug 2019 Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, Vijay Vasudevan

We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics.

3D Object Detection Object +3

Stand-Alone Self-Attention in Vision Models

8 code implementations NeurIPS 2019 Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens

The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.

object-detection Object Detection

Visual Wake Words Dataset

5 code implementations12 Jun 2019 Aakanksha Chowdhery, Pete Warden, Jonathon Shlens, Andrew Howard, Rocky Rhodes

To facilitate the development of microcontroller friendly models, we present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models.

Using learned optimizers to make models robust to input noise

no code implementations8 Jun 2019 Luke Metz, Niru Maheswaranathan, Jonathon Shlens, Jascha Sohl-Dickstein, Ekin D. Cubuk

State-of-the art vision models can achieve superhuman performance on image classification tasks when testing and training data come from the same distribution.

General Classification Image Classification +1

Using Videos to Evaluate Image Model Robustness

no code implementations22 Apr 2019 Keren Gu, Brandon Yang, Jiquan Ngiam, Quoc Le, Jonathon Shlens

Compared to previous studies on adversarial examples and synthetic distortions, natural robustness captures a more diverse set of common image transformations that occur in the natural environment.

A Learned Representation for Scalable Vector Graphics

2 code implementations ICCV 2019 Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens

Dramatic advances in generative models have resulted in near photographic quality for artificially rendered faces, animals and other objects in the natural world.

Vector Graphics

Accelerating Training of Deep Neural Networks with a Standardization Loss

1 code implementation3 Mar 2019 Jasmine Collins, Johannes Balle, Jonathon Shlens

We find that a standardization loss accelerates training on both small- and large-scale image classification experiments, works with a variety of architectures, and is largely robust to training across different batch sizes.

Image Classification

Do Better ImageNet Models Transfer Better?

no code implementations CVPR 2019 Simon Kornblith, Jonathon Shlens, Quoc V. Le

Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer.

Fine-Grained Image Classification General Classification +1

A Dataset and Architecture for Visual Reasoning with a Working Memory

2 code implementations ECCV 2018 Guangyu Robert Yang, Igor Ganichev, Xiao-Jing Wang, Jonathon Shlens, David Sussillo

COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory -- problems that remain challenging for modern deep learning architectures.

Logical Reasoning Visual Question Answering (VQA) +1

Learning a neural response metric for retinal prosthesis

no code implementations ICLR 2018 Nishal P Shah, Sasidhar Madugula, EJ Chichilnisky, Yoram Singer, Jonathon Shlens

Retinal prostheses for treating incurable blindness are designed to electrically stimulate surviving retinal neurons, causing them to send artificial visual signals to the brain.

Progressive Neural Architecture Search

18 code implementations ECCV 2018 Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.

Evolutionary Algorithms General Classification +3

Recurrent Segmentation for Variable Computational Budgets

no code implementations28 Nov 2017 Lane McIntosh, Niru Maheswaranathan, David Sussillo, Jonathon Shlens

Importantly, the RNN may be deployed across a range of computational budgets by merely running the model for a variable number of iterations.

Image Segmentation Segmentation +3

Learning Transferable Architectures for Scalable Image Recognition

17 code implementations CVPR 2018 Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le

In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture".

Classification Image Classification +1

PixColor: Pixel Recursive Colorization

no code implementations19 May 2017 Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy

Then, given the generated low-resolution color image and the original grayscale image as inputs, we train a second CNN to generate a high-resolution colorization of an image.

Colorization

Exploring the structure of a real-time, arbitrary neural artistic stylization network

20 code implementations18 May 2017 Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens

In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair.

Style Transfer

Pixel Recursive Super Resolution

1 code implementation ICCV 2017 Ryan Dahl, Mohammad Norouzi, Jonathon Shlens

A low resolution image may correspond to multiple plausible high resolution images, thus modeling the super resolution process with a pixel independent conditional model often results in averaging different details--hence blurry edges.

regression Super-Resolution

Conditional Image Synthesis With Auxiliary Classifier GANs

36 code implementations ICML 2017 Augustus Odena, Christopher Olah, Jonathon Shlens

We expand on previous work for image quality assessment to provide two new analyses for assessing the discriminability and diversity of samples from class-conditional image synthesis models.

Ranked #13 on Conditional Image Generation on CIFAR-10 (Inception score metric)

Conditional Image Generation Image Quality Assessment

A Learned Representation For Artistic Style

12 code implementations24 Oct 2016 Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur

In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings.

Adversarial Autoencoders

28 code implementations18 Nov 2015 Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey

In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution.

Clustering Data Visualization +5

Net2Net: Accelerating Learning via Knowledge Transfer

3 code implementations18 Nov 2015 Tianqi Chen, Ian Goodfellow, Jonathon Shlens

Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network.

Transfer Learning

Deep Networks With Large Output Spaces

no code implementations23 Dec 2014 Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, Jay Yagnik

Deep neural networks have been extremely successful at various image, speech, video recognition tasks because of their ability to model deep structures within the data.

Video Recognition

Explaining and Harnessing Adversarial Examples

59 code implementations20 Dec 2014 Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy

Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence.

Image Classification

A Tutorial on Independent Component Analysis

3 code implementations11 Apr 2014 Jonathon Shlens

This tutorial provides an introduction to ICA based on linear algebra formulating an intuition for ICA from first principles.

BIG-bench Machine Learning

Notes on Generalized Linear Models of Neurons

no code implementations8 Apr 2014 Jonathon Shlens

Experimental neuroscience increasingly requires tractable models for analyzing and predicting the behavior of neurons and networks.

A Tutorial on Principal Component Analysis

7 code implementations3 Apr 2014 Jonathon Shlens

Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood.

Using Web Co-occurrence Statistics for Improving Image Categorization

no code implementations19 Dec 2013 Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.

Common Sense Reasoning Image Categorization +1

Zero-Shot Learning by Convex Combination of Semantic Embeddings

2 code implementations19 Dec 2013 Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean

In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.

Multi-label zero-shot learning

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

no code implementations CVPR 2013 Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik

Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of component parts.

object-detection Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.