no code implementations • 4 Apr 2022 • Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.
no code implementations • 28 Mar 2022 • Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans.
no code implementations • ICLR 2022 • Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander Toshev, Sergey Levine, Brian Ichter
Hierarchical reinforcement learning aims to enable this by providing a bank of low-level skills as action abstractions.
no code implementations • 18 Aug 2020 • Fei Xia, Chengshu Li, Roberto Martín-Martín, Or Litany, Alexander Toshev, Silvio Savarese
To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base.
no code implementations • ECCV 2020 • AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo
In this paper we propose an adversarial generative grammar model for future prediction.
3 code implementations • 23 Jun 2020 • Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans
In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.
no code implementations • 8 Jun 2020 • Sören Pirk, Karol Hausman, Alexander Toshev, Mohi Khansari
We show that complex plans can be carried out when executing the robotic task and the robot can interactively adapt to changes in the environment and recover from failure cases.
1 code implementation • 30 Oct 2019 • Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese
We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task.
no code implementations • 23 Mar 2019 • Ayzaan Wahid, Alexander Toshev, Marek Fiser, Tsang-Wei Edward Lee
Learned Neural Network based policies have shown promising results for robot navigation.
no code implementations • CVPR 2019 • Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese
Many robotic applications require the agent to perform long-horizon tasks in partially observable environments.
no code implementations • ICCV 2019 • AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo
We present a new method for finding video CNN architectures that capture rich spatio-temporal information in videos.
Ranked #18 on
Action Classification
on Moments in Time
no code implementations • 8 Jun 2018 • Etienne Pot, Alexander Toshev, Jana Kosecka
In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data.
no code implementations • CVPR 2018 • Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine
In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location using primarily visual feedback.
3 code implementations • 15 May 2018 • Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson
We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.
no code implementations • 20 Dec 2017 • Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine
To this end, we train a deep recurrent controller that can automatically determine which actions move the end-point of a robotic arm to a desired object.
2 code implementations • ICCV 2017 • Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh
Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship -- an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized.
no code implementations • CVPR 2017 • George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy
Trained on COCO data alone, our final system achieves average precision of 0. 649 on the COCO test-dev set and the 0. 643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art.
Ranked #6 on
Keypoint Detection
on COCO test-challenge
19 code implementations • 21 Sep 2016 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.
no code implementations • 8 May 2016 • Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly
In this model the output variables for a given input are predicted sequentially using neural networks.
1 code implementation • 20 Nov 2015 • Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei
Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.
1 code implementation • CVPR 2016 • Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy
We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.
no code implementations • 1 Jul 2015 • Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang
We present a method for learning an embedding that places images of humans in similar poses nearby.
70 code implementations • CVPR 2015 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.
Ranked #3 on
Image Retrieval with Multi-Modal Query
on MIT-States
7 code implementations • CVPR 2014 • Alexander Toshev, Christian Szegedy
We propose a method for human pose estimation based on Deep Neural Networks (DNNs).
no code implementations • 17 Dec 2013 • Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe
Multilabel image annotation is one of the most important challenges in computer vision with many real-world applications.
6 code implementations • CVPR 2014 • Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov
Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012).
no code implementations • NeurIPS 2013 • Christian Szegedy, Alexander Toshev, Dumitru Erhan
Deep Neural Networks (DNNs) have recently shown outstanding performance on the task of whole image classification.