Search Results for author: Alexander Toshev

Found 41 papers, 16 papers with code

Scalable Pre-training of Large Autoregressive Image Models

2 code implementations16 Jan 2024 Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #331 on Image Classification on ImageNet (using extra training data)

Image Classification

Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation

no code implementations27 Nov 2023 Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev

Recent advances in image tokenizers, such as VQ-VAE, have enabled text-to-image generation using auto-regressive methods, similar to language modeling.

Language Modelling Text-to-Image Generation

Data Filtering Networks

2 code implementations29 Sep 2023 Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander Toshev, Vaishaal Shankar

Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks: for instance, a model that performs well on ImageNet can yield worse training sets than a model with low ImageNet accuracy that is trained on a small amount of high-quality data.

Language Modelling

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

no code implementations8 Sep 2023 Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.

Value function estimation using conditional diffusion models for control

no code implementations9 Jun 2023 Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind

A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data.

Continuous Control

On Robustness in Multimodal Learning

no code implementations10 Apr 2023 Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.

Representation Learning

Gesture2Path: Imitation Learning for Gesture-aware Navigation

no code implementations19 Sep 2022 Catie Cuan, Edward Lee, Emre Fisher, Anthony Francis, Leila Takayama, Tingnan Zhang, Alexander Toshev, Sören Pirk

Our experiments indicate that our method is able to successfully interpret complex human gestures and to use them as a signal to generate socially compliant trajectories for navigation tasks.

Imitation Learning Model Predictive Control +2

GAUDI: A Neural Architect for Immersive 3D Scene Generation

1 code implementation27 Jul 2022 Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Image Generation Scene Generation

Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation

no code implementations28 Mar 2022 Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone

Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans.

Imitation Learning Navigate +1

ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

no code implementations18 Aug 2020 Fei Xia, Chengshu Li, Roberto Martín-Martín, Or Litany, Alexander Toshev, Silvio Savarese

To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base.

Continuous Control Hierarchical Reinforcement Learning +2

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

3 code implementations23 Jun 2020 Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.

Object

Modeling Long-horizon Tasks as Sequential Interaction Landscapes

no code implementations8 Jun 2020 Sören Pirk, Karol Hausman, Alexander Toshev, Mohi Khansari

We show that complex plans can be carried out when executing the robotic task and the robot can interactively adapt to changes in the environment and recover from failure cases.

Robot Manipulation

Interactive Gibson Benchmark (iGibson 0.5): A Benchmark for Interactive Navigation in Cluttered Environments

1 code implementation30 Oct 2019 Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese

We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task.

Robot Navigation

Long Range Neural Navigation Policies for the Real World

no code implementations23 Mar 2019 Ayzaan Wahid, Alexander Toshev, Marek Fiser, Tsang-Wei Edward Lee

Learned Neural Network based policies have shown promising results for robot navigation.

Robot Navigation

Self-supervisory Signals for Object Discovery and Detection

no code implementations8 Jun 2018 Etienne Pot, Alexander Toshev, Jana Kosecka

In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data.

Clustering Object +1

Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control

no code implementations CVPR 2018 Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine

In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location using primarily visual feedback.

Robot Manipulation

Visual Representations for Semantic Target Driven Navigation

3 code implementations15 May 2018 Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson

We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.

Domain Adaptation Visual Navigation

Sim2Real View Invariant Visual Servoing by Recurrent Control

no code implementations20 Dec 2017 Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine

To this end, we train a deep recurrent controller that can automatically determine which actions move the end-point of a robotic arm to a desired object.

No Fuss Distance Metric Learning using Proxies

2 code implementations ICCV 2017 Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh

Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship -- an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized.

Metric Learning Semantic Similarity +2

Towards Accurate Multi-person Pose Estimation in the Wild

no code implementations CVPR 2017 George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy

Trained on COCO data alone, our final system achieves average precision of 0. 649 on the COCO test-dev set and the 0. 643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art.

Human Detection Keypoint Detection +1

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

19 code implementations21 Sep 2016 Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.

Image Captioning Sentence +1

Chained Predictions Using Convolutional Neural Networks

no code implementations8 May 2016 Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly

In this model the output variables for a given input are predicted sequentially using neural networks.

Pose Estimation

The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

1 code implementation20 Nov 2015 Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.

Ranked #4 on Fine-Grained Image Classification on CUB-200-2011 (using extra training data)

Active Learning Fine-Grained Image Classification

Generation and Comprehension of Unambiguous Object Descriptions

1 code implementation CVPR 2016 Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.

Image Captioning Object +1

Deep Convolutional Ranking for Multilabel Image Annotation

no code implementations17 Dec 2013 Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe

Multilabel image annotation is one of the most important challenges in computer vision with many real-world applications.

Scalable Object Detection using Deep Neural Networks

6 code implementations CVPR 2014 Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov

Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012).

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.