Search Results for author: Alexander Toshev

Found 41 papers, 15 papers with code

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

no code implementations • 14 Mar 2024 • Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, BoWen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, ZiRui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang

Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons.

Ranked #20 on Visual Question Answering on MM-Vet

In-Context Learning Visual Question Answering

Paper
Add Code

Scalable Pre-training of Large Autoregressive Image Models

2 code implementations • 16 Jan 2024 • Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #336 on Image Classification on ImageNet (using extra training data)

Image Classification

2,764

Paper
Code

Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation

no code implementations • 27 Nov 2023 • Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev

Recent advances in image tokenizers, such as VQ-VAE, have enabled text-to-image generation using auto-regressive methods, similar to language modeling.

Language Modelling Text-to-Image Generation

Paper
Add Code

Large Language Models as Generalizable Policies for Embodied Tasks

no code implementations • 26 Oct 2023 • Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev

We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks.

Language Modelling Large Language Model +1

Paper
Add Code

Data Filtering Networks

2 code implementations • 29 Sep 2023 • Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander Toshev, Vaishaal Shankar

Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks: for instance, a model that performs well on ImageNet can yield worse training sets than a model with low ImageNet accuracy that is trained on a small amount of high-quality data.

Language Modelling

362

Paper
Code

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

no code implementations • 8 Sep 2023 • Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.

Paper
Add Code

Principles and Guidelines for Evaluating Social Robot Navigation Algorithms

no code implementations • 29 Jun 2023 • Anthony Francis, Claudia Pérez-D'Arpino, Chengshu Li, Fei Xia, Alexandre Alahi, Rachid Alami, Aniket Bera, Abhijat Biswas, Joydeep Biswas, Rohan Chandra, Hao-Tien Lewis Chiang, Michael Everett, Sehoon Ha, Justin Hart, Jonathan P. How, Haresh Karnan, Tsang-Wei Edward Lee, Luis J. Manso, Reuth Mirksy, Sören Pirk, Phani Teja Singamaneni, Peter Stone, Ada V. Taylor, Peter Trautman, Nathan Tsoi, Marynel Vázquez, Xuesu Xiao, Peng Xu, Naoki Yokoyama, Alexander Toshev, Roberto Martín-Martín

A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation.

Benchmarking Social Navigation

Paper
Add Code

Value function estimation using conditional diffusion models for control

no code implementations • 9 Jun 2023 • Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind

A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data.

Continuous Control

Paper
Add Code

On Robustness in Multimodal Learning

no code implementations • 10 Apr 2023 • Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.

Representation Learning

Paper
Add Code

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

no code implementations • 30 Jan 2023 • Chen Chen, BoWen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang

We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.

Information Retrieval Retrieval +1

Paper
Add Code

Perceptual Grouping in Contrastive Vision-Language Models

1 code implementation • ICCV 2023 • Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens

In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.

Ranked #1 on Unsupervised Semantic Segmentation with Language-image Pre-training on MS COCO

Object Localization Representation Learning +1

Paper
Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

Gesture2Path: Imitation Learning for Gesture-aware Navigation

no code implementations • 19 Sep 2022 • Catie Cuan, Edward Lee, Emre Fisher, Anthony Francis, Leila Takayama, Tingnan Zhang, Alexander Toshev, Sören Pirk

Our experiments indicate that our method is able to successfully interpret complex human gestures and to use them as a signal to generate socially compliant trajectories for navigation tasks.

Imitation Learning Model Predictive Control +2

Paper
Add Code

GAUDI: A Neural Architect for Immersive 3D Scene Generation

1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Ranked #1 on Image Generation on ARKitScenes

Image Generation Scene Generation

604

Paper
Code

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

3 code implementations • 4 Apr 2022 • Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, Andy Zeng

We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.

Decision Making Language Modelling +1

163

Paper
Code

Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation

no code implementations • 28 Mar 2022 • Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone

Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans.

Imitation Learning Navigate +1

Paper
Add Code

Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning

no code implementations • ICLR 2022 • Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander Toshev, Sergey Levine, Brian Ichter

Hierarchical reinforcement learning aims to enable this by providing a bank of low-level skills as action abstractions.

Hierarchical Reinforcement Learning reinforcement-learning +2

Paper
Add Code

ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

no code implementations • 18 Aug 2020 • Fei Xia, Chengshu Li, Roberto Martín-Martín, Or Litany, Alexander Toshev, Silvio Savarese

To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base.

Continuous Control Hierarchical Reinforcement Learning +2

Paper
Add Code

Adversarial Generative Grammars for Human Activity Prediction

no code implementations • ECCV 2020 • AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo

In this paper we propose an adversarial generative grammar model for future prediction.

Activity Prediction Future prediction +1

Paper
Add Code

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

3 code implementations • 23 Jun 2020 • Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.

Object

1,731

Paper
Code

Modeling Long-horizon Tasks as Sequential Interaction Landscapes

no code implementations • 8 Jun 2020 • Sören Pirk, Karol Hausman, Alexander Toshev, Mohi Khansari

We show that complex plans can be carried out when executing the robotic task and the robot can interactively adapt to changes in the environment and recover from failure cases.

Robot Manipulation

Paper
Add Code

Interactive Gibson Benchmark (iGibson 0.5): A Benchmark for Interactive Navigation in Cluttered Environments

1 code implementation • 30 Oct 2019 • Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese

We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task.

Robot Navigation

606

Paper
Code

Long Range Neural Navigation Policies for the Real World

no code implementations • 23 Mar 2019 • Ayzaan Wahid, Alexander Toshev, Marek Fiser, Tsang-Wei Edward Lee

Learned Neural Network based policies have shown promising results for robot navigation.

Robot Navigation

Paper
Add Code

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

no code implementations • CVPR 2019 • Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese

Many robotic applications require the agent to perform long-horizon tasks in partially observable environments.

Decision Making reinforcement-learning +2

Paper
Add Code

Evolving Space-Time Neural Architectures for Videos

no code implementations • ICCV 2019 • AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo

We present a new method for finding video CNN architectures that capture rich spatio-temporal information in videos.

Ranked #22 on Action Classification on MiT

Action Classification Action Recognition In Videos

Paper
Add Code

Self-supervisory Signals for Object Discovery and Detection

no code implementations • 8 Jun 2018 • Etienne Pot, Alexander Toshev, Jana Kosecka

In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data.

Clustering Object +1

Paper
Add Code

Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control

no code implementations • CVPR 2018 • Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine

In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location using primarily visual feedback.

Robot Manipulation

Paper
Add Code

Visual Representations for Semantic Target Driven Navigation

3 code implementations • 15 May 2018 • Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson

We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.

Domain Adaptation Visual Navigation

76,628

Paper
Code

Sim2Real View Invariant Visual Servoing by Recurrent Control

no code implementations • 20 Dec 2017 • Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine

To this end, we train a deep recurrent controller that can automatically determine which actions move the end-point of a robotic arm to a desired object.

Paper
Add Code

No Fuss Distance Metric Learning using Proxies

2 code implementations • ICCV 2017 • Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh

Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship -- an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized.

Metric Learning Semantic Similarity +2

180

Paper
Code

Towards Accurate Multi-person Pose Estimation in the Wild

no code implementations • CVPR 2017 • George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy

Trained on COCO data alone, our final system achieves average precision of 0. 649 on the COCO test-dev set and the 0. 643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art.

Ranked #6 on Keypoint Detection on COCO test-challenge

Human Detection Keypoint Detection +1

Paper
Add Code

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

19 code implementations • 21 Sep 2016 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.

Image Captioning Sentence +1

65,339

Paper
Code

Chained Predictions Using Convolutional Neural Networks

no code implementations • 8 May 2016 • Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly

In this model the output variables for a given input are predicted sequentially using neural networks.

Pose Estimation

Paper
Add Code

The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

1 code implementation • 20 Nov 2015 • Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.

Ranked #5 on Fine-Grained Image Classification on CUB-200-2011 (using extra training data)

Active Learning Fine-Grained Image Classification

Paper
Code

Generation and Comprehension of Unambiguous Object Descriptions

1 code implementation • CVPR 2016 • Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.

Image Captioning Object +1

157

Paper
Code

Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

no code implementations • 1 Jul 2015 • Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang

We present a method for learning an embedding that places images of humans in similar poses nearby.

Retrieval

Paper
Add Code

Show and Tell: A Neural Image Caption Generator

75 code implementations • CVPR 2015 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.

Ranked #3 on Image Retrieval with Multi-Modal Query on MIT-States

Image Captioning Image Retrieval with Multi-Modal Query +4

327

Paper
Code

DeepPose: Human Pose Estimation via Deep Neural Networks

7 code implementations • CVPR 2014 • Alexander Toshev, Christian Szegedy

We propose a method for human pose estimation based on Deep Neural Networks (DNNs).

Pose Estimation regression

5,045

Paper
Code

Deep Convolutional Ranking for Multilabel Image Annotation

no code implementations • 17 Dec 2013 • Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe

Multilabel image annotation is one of the most important challenges in computer vision with many real-world applications.

Paper
Add Code

Scalable Object Detection using Deep Neural Networks

6 code implementations • CVPR 2014 • Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov

Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012).

Object object-detection +2

2,965

Paper
Code

Deep Neural Networks for Object Detection

no code implementations • NeurIPS 2013 • Christian Szegedy, Alexander Toshev, Dumitru Erhan

Deep Neural Networks (DNNs) have recently shown outstanding performance on the task of whole image classification.

General Classification Image Classification +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.