no code implementations • 8 Sep 2023 • Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du
We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.
no code implementations • 29 Jun 2023 • Anthony Francis, Claudia Pérez-D'Arpino, Chengshu Li, Fei Xia, Alexandre Alahi, Rachid Alami, Aniket Bera, Abhijat Biswas, Joydeep Biswas, Rohan Chandra, Hao-Tien Lewis Chiang, Michael Everett, Sehoon Ha, Justin Hart, Jonathan P. How, Haresh Karnan, Tsang-Wei Edward Lee, Luis J. Manso, Reuth Mirksy, Sören Pirk, Phani Teja Singamaneni, Peter Stone, Ada V. Taylor, Peter Trautman, Nathan Tsoi, Marynel Vázquez, Xuesu Xiao, Peng Xu, Naoki Yokoyama, Alexander Toshev, Roberto Martín-Martín
A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation.
no code implementations • 9 Jun 2023 • Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind
A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data.
no code implementations • 10 Apr 2023 • Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev
Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.
no code implementations • 30 Jan 2023 • Chen Chen, BoWen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang
We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.
1 code implementation • 18 Oct 2022 • Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens
In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
Ranked #1 on
Unsupervised Semantic Segmentation
on PASCAL VOC 2012 val
(using extra training data)
no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu
We present a retrospective on the state of Embodied AI research.
no code implementations • 19 Sep 2022 • Catie Cuan, Edward Lee, Emre Fisher, Anthony Francis, Leila Takayama, Tingnan Zhang, Alexander Toshev, Sören Pirk
Our experiments indicate that our method is able to successfully interpret complex human gestures and to use them as a signal to generate socially compliant trajectories for navigation tasks.
1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
Ranked #1 on
Image Generation
on ARKitScenes
3 code implementations • 4 Apr 2022 • Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, Andy Zeng
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.
no code implementations • 28 Mar 2022 • Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans.
no code implementations • ICLR 2022 • Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander Toshev, Sergey Levine, Brian Ichter
Hierarchical reinforcement learning aims to enable this by providing a bank of low-level skills as action abstractions.
Hierarchical Reinforcement Learning
reinforcement-learning
+1
no code implementations • 18 Aug 2020 • Fei Xia, Chengshu Li, Roberto Martín-Martín, Or Litany, Alexander Toshev, Silvio Savarese
To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base.
no code implementations • ECCV 2020 • AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo
In this paper we propose an adversarial generative grammar model for future prediction.
2 code implementations • 23 Jun 2020 • Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans
In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.
no code implementations • 8 Jun 2020 • Sören Pirk, Karol Hausman, Alexander Toshev, Mohi Khansari
We show that complex plans can be carried out when executing the robotic task and the robot can interactively adapt to changes in the environment and recover from failure cases.
1 code implementation • 30 Oct 2019 • Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese
We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task.
no code implementations • 23 Mar 2019 • Ayzaan Wahid, Alexander Toshev, Marek Fiser, Tsang-Wei Edward Lee
Learned Neural Network based policies have shown promising results for robot navigation.
no code implementations • CVPR 2019 • Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese
Many robotic applications require the agent to perform long-horizon tasks in partially observable environments.
no code implementations • ICCV 2019 • AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo
We present a new method for finding video CNN architectures that capture rich spatio-temporal information in videos.
Ranked #20 on
Action Classification
on Moments in Time
no code implementations • 8 Jun 2018 • Etienne Pot, Alexander Toshev, Jana Kosecka
In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data.
no code implementations • CVPR 2018 • Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine
In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location using primarily visual feedback.
3 code implementations • 15 May 2018 • Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson
We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy.
no code implementations • 20 Dec 2017 • Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine
To this end, we train a deep recurrent controller that can automatically determine which actions move the end-point of a robotic arm to a desired object.
2 code implementations • ICCV 2017 • Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh
Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship -- an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized.
no code implementations • CVPR 2017 • George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy
Trained on COCO data alone, our final system achieves average precision of 0. 649 on the COCO test-dev set and the 0. 643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art.
Ranked #6 on
Keypoint Detection
on COCO test-challenge
19 code implementations • 21 Sep 2016 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.
no code implementations • 8 May 2016 • Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly
In this model the output variables for a given input are predicted sequentially using neural networks.
1 code implementation • 20 Nov 2015 • Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei
Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.
Ranked #4 on
Fine-Grained Image Classification
on CUB-200-2011
(using extra training data)
1 code implementation • CVPR 2016 • Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy
We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.
no code implementations • 1 Jul 2015 • Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang
We present a method for learning an embedding that places images of humans in similar poses nearby.
76 code implementations • CVPR 2015 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.
Ranked #3 on
Image Retrieval with Multi-Modal Query
on MIT-States
no code implementations • 17 Dec 2013 • Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe
Multilabel image annotation is one of the most important challenges in computer vision with many real-world applications.
7 code implementations • CVPR 2014 • Alexander Toshev, Christian Szegedy
We propose a method for human pose estimation based on Deep Neural Networks (DNNs).
6 code implementations • CVPR 2014 • Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov
Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012).
no code implementations • NeurIPS 2013 • Christian Szegedy, Alexander Toshev, Dumitru Erhan
Deep Neural Networks (DNNs) have recently shown outstanding performance on the task of whole image classification.