Search Results for author: Silvio Savarese

Found 171 papers, 69 papers with code

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

no code implementations • 11 Apr 2024 • Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity.

Benchmarking

Paper
Add Code

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

no code implementations • 14 Mar 2024 • Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews, Ivan Villa-Renteria, Jerry Huayang Tang, Claire Tang, Fei Xia, Yunzhu Li, Silvio Savarese, Hyowon Gweon, C. Karen Liu, Jiajun Wu, Li Fei-Fei

We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics.

Paper
Add Code

AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

1 code implementation • 23 Feb 2024 • Zhiwei Liu, Weiran Yao, JianGuo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K. Choubey, Tian Lan, Jason Wu, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese

Thus, we open-source a new AI agent library, AgentLite, which simplifies this process by offering a lightweight, user-friendly platform for innovating LLM agent reasoning, architectures, and applications with ease.

293

Paper
Code

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

2 code implementations • 23 Feb 2024 • JianGuo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training.

Paper
Code

Unified Training of Universal Time Series Forecasting Transformers

1 code implementation • 4 Feb 2024 • Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, Doyen Sahoo

Deep learning for time series forecasting has traditionally operated within a one-model-per-dataset framework, limiting its potential to leverage the game-changing impact of large pre-trained models.

Time Series Time Series Forecasting

433

Paper
Code

Causal Layering via Conditional Entropy

no code implementations • 19 Jan 2024 • Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

Under appropriate assumptions and conditioning, we can separate the sources or sinks from the remainder of the nodes by comparing their conditional entropy to the unconditional entropy of their noise.

Causal Discovery

Paper
Add Code

Editing Arbitrary Propositions in LLMs without Subject Labels

no code implementations • 15 Jan 2024 • Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

On datasets of binary propositions derived from the CounterFact dataset, we show that our method -- without access to subject labels -- performs close to state-of-the-art L\&E methods which has access subject labels.

Language Modelling Large Language Model +1

Paper
Add Code

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

1 code implementation • 30 Nov 2023 • Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

Visual Reasoning

Paper
Code

Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change

no code implementations • 15 Nov 2023 • Tao Sun, Yan Hao, Shengyu Huang, Silvio Savarese, Konrad Schindler, Marc Pollefeys, Iro Armeni

To this end, we introduce the Nothing Stands Still (NSS) benchmark, which focuses on the spatiotemporal registration of 3D scenes undergoing large spatial and temporal change, ultimately creating one coherent spatiotemporal map.

Point Cloud Registration

Paper
Add Code

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

no code implementations • 16 Oct 2023 • Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai

Through extensive probing and a new pasting experiment, we further reveal several mechanisms within the trained transformers, such as concrete copying behaviors on both the inputs and the representations, linear ICL capability of the upper layers alone, and a post-ICL representation selection mechanism in a harder mixture setting.

In-Context Learning

Paper
Add Code

XGen-7B Technical Report

1 code implementation • 7 Sep 2023 • Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong

Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context.

2k 8k

713

Paper
Code

Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

no code implementations • 16 Aug 2023 • JianGuo Zhang, Stephen Roller, Kun Qian, Zhiwei Liu, Rui Meng, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models.

Natural Language Understanding Retrieval +1

Paper
Add Code

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

2 code implementations • 11 Aug 2023 • Zhiwei Liu, Weiran Yao, JianGuo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs).

Benchmarking Decision Making

293

Paper
Code

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

1 code implementation • 4 Aug 2023 • Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.

Language Modelling

Paper
Code

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

1 code implementation • 19 Jul 2023 • JianGuo Zhang, Kun Qian, Zhiwei Liu, Shelby Heinecke, Rui Meng, Ye Liu, Zhou Yu, Huan Wang, Silvio Savarese, Caiming Xiong

Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness.

Few-Shot Learning Language Modelling +1

438

Paper
Code

REX: Rapid Exploration and eXploitation for AI Agents

no code implementations • 18 Jul 2023 • Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

1 code implementation • 1 Jun 2023 • Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.

Multi-Task Learning Visual Navigation

Paper
Code

Modeling Dynamic Environments with Scene Graph Memory

no code implementations • 27 May 2023 • Andrey Kurenkov, Michael Lingelbach, Tanmay Agarwal, Emily Jin, Chengshu Li, Ruohan Zhang, Li Fei-Fei, Jiajun Wu, Silvio Savarese, Roberto Martín-Martín

We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy.

Link Prediction

Paper
Add Code

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

1 code implementation • NeurIPS 2023 • Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, ran Xu

Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages.

Image Generation

581

Paper
Code

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

1 code implementation • 14 May 2023 • Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

It achieves a new SOTA of 50. 6% (top-1) on Objaverse-LVIS and 84. 7% (top-1) on ModelNet40 in zero-shot classification.

Ranked #6 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Classification 3D Point Cloud Classification +4

359

Paper
Code

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

2 code implementations • 3 May 2023 • Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou

In this study, we attempt to render the training of LLMs for program synthesis more efficient by unifying four key components: (1) model architectures, (2) learning methods, (3) infill sampling, and, (4) data distributions.

Causal Language Modeling Decoder +3

4,778

Paper
Code

An Extensible Multimodal Multi-task Object Dataset with Materials

no code implementations • 29 Apr 2023 • Trevor Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, Silvio Savarese

For example, we can train a model to predict the object category from the listing text, or the mass and price from the product listing image.

Attribute Multi-Task Learning +1

Paper
Add Code

Procedure-Aware Pretraining for Instructional Video Understanding

1 code implementation • CVPR 2023 • Honglu Zhou, Roberto Martín-Martín, Mubbasir Kapadia, Silvio Savarese, Juan Carlos Niebles

This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form to generalize to multiple procedure understanding tasks.

Video Understanding

Paper
Code

HIVE: Harnessing Human Feedback for Instructional Visual Editing

1 code implementation • 16 Mar 2023 • Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, ran Xu

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences.

Text-based Image Editing

Paper
Code

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

12 code implementations • 30 Jan 2023 • Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Ranked #1 on Image Retrieval on Flickr30k

Generative Visual Question Answering Image Captioning +10

125,725

Paper
Code

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

1 code implementation • CVPR 2023 • Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Then, ULIP learns a 3D representation space aligned with the common image-text space, using a small number of automatically synthesized triplets.

Ranked #3 on Training-free 3D Point Cloud Classification on ModelNet40 (using extra training data)

3D Architecture 3D Classification +5

359

Paper
Code

Best-$k$ Search Algorithm for Neural Text Generation

no code implementations • 22 Nov 2022 • Jiacheng Xu, Caiming Xiong, Silvio Savarese, Yingbo Zhou

We first investigate the vanilla best-first search (BFS) algorithm and then propose the Best-$k$ Search algorithm.

Question Generation Question-Generation +2

Paper
Add Code

Online Distribution Shift Detection via Recency Prediction

no code implementations • 17 Nov 2022 • Rachel Luo, Rohan Sinha, Yixiao Sun, Ali Hindy, Shengjia Zhao, Silvio Savarese, Edward Schmerling, Marco Pavone

When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical.

Paper
Add Code

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

2 code implementations • 17 Oct 2022 • Anthony Meng Huat Tiong, Junnan Li, Boyang Li, Silvio Savarese, Steven C. H. Hoi

Visual question answering (VQA) is a hallmark of vision and language reasoning and a challenging task under the zero-shot setting.

Ranked #2 on Visual Question Answering (VQA) on VQA v2 val

Image Captioning Network Interpretation +2

8,830

Paper
Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

LAVIS: A Library for Language-Vision Intelligence

1 code implementation • 15 Sep 2022 • Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Benchmarking Image Captioning +8

8,830

Paper
Code

Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object Detection and Tracking

no code implementations • 22 Aug 2022 • JunYoung Gwak, Silvio Savarese, Jeannette Bohg

In this work, we present Minkowski Tracker, a sparse spatio-temporal R-CNN that jointly solves object detection and tracking.

3D Object Detection Multi-Object Tracking +3

Paper
Add Code

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

2 code implementations • 5 Jul 2022 • Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven C. H. Hoi

To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL).

Ranked #1 on Code Generation on APPS

Code Generation Decoder +3

2,602

Paper
Code

Masked Unsupervised Self-training for Label-free Image Classification

1 code implementation • 7 Jun 2022 • Junnan Li, Silvio Savarese, Steven C. H. Hoi

We demonstrate the efficacy of MUST on a variety of downstream tasks, where it improves upon CLIP by a large margin.

Image Classification Representation Learning +1

103

Paper
Code

OmniXAI: A Library for Explainable AI

2 code implementations • 1 Jun 2022 • Wenzhuo Yang, Hung Le, Tanmay Laud, Silvio Savarese, Steven C. H. Hoi

We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice.

counterfactual Counterfactual Explanation +5

813

Paper
Code

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

5 code implementations • 25 Mar 2022 • Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong

To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.

Ranked #81 on Code Generation on HumanEval

Code Generation Language Modelling +2

11,820

Paper
Code

Long Document Summarization with Top-down and Bottom-up Inference

no code implementations • 15 Mar 2022 • Bo Pang, Erik Nijkamp, Wojciech Kryściński, Silvio Savarese, Yingbo Zhou, Caiming Xiong

Critical to the success of a summarization model is the faithful inference of latent representations of words or tokens in the source documents.

Ranked #1 on Text Summarization on Pubmed

Paper
Add Code

ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation

no code implementations • 14 Mar 2022 • Bokui Shen, Zhenyu Jiang, Christopher Choy, Leonidas J. Guibas, Silvio Savarese, Anima Anandkumar, Yuke Zhu

Manipulating volumetric deformable objects in the real world, like plush toys and pizza dough, bring substantial challenges due to infinite shape variations, non-rigid motions, and partial observability.

Contrastive Learning Deformable Object Manipulation

Paper
Add Code

Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation

no code implementations • 9 Dec 2021 • Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Roberto Martín-Martín

Doing this is challenging for two reasons: on the data side, current interfaces make collecting high-quality human demonstrations difficult, and on the learning side, policies trained on limited data can suffer from covariate shift when deployed.

Imitation Learning Navigate

Paper
Add Code

Long Document Summarization with Top-Down and Bottom-Up Representation Inference

no code implementations • 29 Sep 2021 • Bo Pang, Erik Nijkamp, Wojciech Maciej Kryscinski, Silvio Savarese, Yingbo Zhou, Caiming Xiong

Critical to the success of a summarization model is the faithful inference of latent representations of words or tokens in the source documents.

Document Summarization

Paper
Add Code

Sample-Efficient Safety Assurances using Conformal Prediction

no code implementations • 28 Sep 2021 • Rachel Luo, Shengjia Zhao, Jonathan Kuck, Boris Ivanovic, Silvio Savarese, Edward Schmerling, Marco Pavone

When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial.

Conformal Prediction Robotic Grasping

Paper
Add Code

Merlion: A Machine Learning Library for Time Series

2 code implementations • 20 Sep 2021 • Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang

We introduce Merlion, an open-source machine learning library for time series.

Anomaly Detection BIG-bench Machine Learning +2

3,269

Paper
Code

Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

no code implementations • 2 Sep 2021 • Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn

However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks.

Paper
Add Code

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

1 code implementation • 13 Aug 2021 • Chen Wang, Claudia Pérez-D'Arpino, Danfei Xu, Li Fei-Fei, C. Karen Liu, Silvio Savarese

Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator.

Paper
Code

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

1 code implementation • 6 Aug 2021 • Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martín-Martín

Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation.

Imitation Learning reinforcement-learning +2

433

Paper
Code

iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

1 code implementation • 6 Aug 2021 • Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

We evaluate the new capabilities of iGibson 2. 0 to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI.

Imitation Learning

606

Paper
Code

BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

no code implementations • 6 Aug 2021 • Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei

We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation.

Paper
Add Code

Discovering Generalizable Skills via Automated Generation of Diverse Tasks

no code implementations • 26 Jun 2021 • Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection

no code implementations • CVPR 2022 • Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi

However, learning to recognise human actions and their social interactions in an unconstrained real-world environment comprising numerous people, with potentially highly unbalanced and long-tailed distributed action labels from a stream of sensory data captured from a mobile robot platform remains a significant challenge, not least owing to the lack of a reflective large-scale dataset.

Action Detection Action Understanding +1

Paper
Add Code

TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

no code implementations • ICCV 2021 • Vida Adeli, Mahsa Ehsanpour, Ian Reid, Juan Carlos Niebles, Silvio Savarese, Ehsan Adeli, Hamid Rezatofighi

Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems.

Autonomous Driving Human-Object Interaction Detection

Paper
Add Code

LASER: Learning a Latent Action Space for Efficient Reinforcement Learning

no code implementations • 29 Mar 2021 • Arthur Allshire, Roberto Martín-Martín, Charles Lin, Shawn Manuel, Silvio Savarese, Animesh Garg

Additionally, similar tasks or instances of the same task family impose latent manifold constraints on the most effective action space: the task family can be best solved with actions in a manifold of the entire action space of the robot.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control

no code implementations • 28 Feb 2021 • Chen Wang, Rui Wang, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Danfei Xu

Key to such capability is hand-eye coordination, a cognitive ability that enables humans to adaptively direct their movements at task-relevant objects and be invariant to the objects' absolute spatial location.

Imitation Learning Zero-shot Generalization

Paper
Add Code

Local Calibration: Metrics and Recalibration

no code implementations • 22 Feb 2021 • Rachel Luo, Aadyot Bhatnagar, Yu Bai, Shengjia Zhao, Huan Wang, Caiming Xiong, Silvio Savarese, Stefano Ermon, Edward Schmerling, Marco Pavone

In this work, we propose the local calibration error (LCE) to span the gap between average and individual reliability.

Decision Making Fairness

Paper
Add Code

Embodied Intelligence via Learning and Evolution

1 code implementation • 3 Feb 2021 • Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei

However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning.

148

Paper
Code

Learning Multi-Arm Manipulation Through Collaborative Teleoperation

no code implementations • 12 Dec 2020 • Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks.

Imitation Learning

Paper
Add Code

Human-in-the-Loop Imitation Learning using Remote Teleoperation

no code implementations • 12 Dec 2020 • Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

We develop a simple and effective algorithm to train the policy iteratively on new data collected by the system that encourages the policy to learn how to traverse bottlenecks through the interventions.

Imitation Learning Robot Manipulation

Paper
Add Code

Topological Planning with Transformers for Vision-and-Language Navigation

no code implementations • CVPR 2021 • Kevin Chen, Junshen K. Chen, Jo Chuang, Marynel Vázquez, Silvio Savarese

Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments.

Vision and Language Navigation

Paper
Add Code

Semantic and Geometric Modeling with Neural Message Passing in 3D Scene Graphs for Hierarchical Mechanical Search

no code implementations • 7 Dec 2020 • Andrey Kurenkov, Roberto Martín-Martín, Jeff Ichnowski, Ken Goldberg, Silvio Savarese

We propose to use a 3D scene graph representation to capture the hierarchical, semantic, and geometric aspects of this problem.

Object

Paper
Add Code

IGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes

2 code implementations • 5 Dec 2020 • Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D'Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, Silvio Savarese

We present iGibson 1. 0, a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.

Imitation Learning

606

Paper
Code

Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

no code implementations • 13 Nov 2020 • Bryan Chen, Alexander Sax, Gene Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto

Vision-based robotics often separates the control loop into one module for perception and a separate module for control.

Paper
Add Code

Robot Navigation in Constrained Pedestrian Environments using Reinforcement Learning

2 code implementations • 16 Oct 2020 • Claudia Pérez-D'Arpino, Can Liu, Patrick Goebel, Roberto Martín-Martín, Silvio Savarese

Navigating fluently around pedestrians is a necessary capability for mobile robots deployed in human environments, such as buildings and homes.

Pose Estimation reinforcement-learning +2

Paper
Code

Privacy Preserving Recalibration under Domain Shift

no code implementations • 21 Aug 2020 • Rachel Luo, Shengjia Zhao, Jiaming Song, Jonathan Kuck, Stefano Ermon, Silvio Savarese

In an extensive empirical study, we find that our algorithm improves calibration on domain-shift benchmarks under the constraints of differential privacy.

Privacy Preserving

Paper
Add Code

ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

no code implementations • 18 Aug 2020 • Fei Xia, Chengshu Li, Roberto Martín-Martín, Or Litany, Alexander Toshev, Silvio Savarese

To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base.

Continuous Control Hierarchical Reinforcement Learning +2

Paper
Add Code

Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter

no code implementations • 13 Aug 2020 • Andrey Kurenkov, Joseph Taglic, Rohun Kulkarni, Marcus Dominguez-Kuhne, Animesh Garg, Roberto Martín-Martín, Silvio Savarese

When searching for objects in cluttered environments, it is often necessary to perform complex interactions in order to move occluding objects out of the way and fully reveal the object of interest and make it graspable.

Object Reinforcement Learning (RL) +1

Paper
Add Code

How Trustworthy are Performance Evaluations for Basic Vision Tasks?

no code implementations • 8 Aug 2020 • Tran Thien Dat Nguyen, Hamid Rezatofighi, Ba-Ngu Vo, Ba-Tuong Vo, Silvio Savarese, Ian Reid

This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking.

Multi-Object Tracking object-detection +1

Paper
Add Code

Goal-Aware Prediction: Learning to Model What Matters

no code implementations • ICML 2020 • Suraj Nair, Silvio Savarese, Chelsea Finn

In this paper, we propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space, resulting in a learning objective that more closely matches the downstream task.

Paper
Add Code

Adaptive Procedural Task Generation for Hard-Exploration Problems

no code implementations • ICLR 2021 • Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.

Paper
Add Code

Generative Sparse Detection Networks for 3D Single-shot Object Detection

4 code implementations • ECCV 2020 • JunYoung Gwak, Christopher Choy, Silvio Savarese

3D object detection has been widely studied due to its potential applicability to many promising areas such as robotics and augmented reality.

Ranked #6 on 3D Object Detection on S3DIS

3D Object Detection Decoder +2

2,306

Paper
Code

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations

no code implementations • 13 Mar 2020 • Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, Li Fei-Fei

In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations.

Imitation Learning

Paper
Add Code

JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset

1 code implementation • 19 Feb 2020 • Abhijeet Shenoi, Mihir Patel, JunYoung Gwak, Patrick Goebel, Amir Sadeghian, Hamid Rezatofighi, Roberto Martín-Martín, Silvio Savarese

In this work we present JRMOT, a novel 3D MOT system that integrates information from RGB images and 3D point clouds to achieve real-time, state-of-the-art tracking performance.

Ranked #8 on Multiple Object Tracking on KITTI Tracking test

Autonomous Navigation Motion Planning +2

142

Paper
Code

Learning to Navigate Using Mid-Level Visual Priors

1 code implementation • 23 Dec 2019 • Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

How much does having visual priors about the world (e. g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e. g. navigating a complex environment)?

Navigate reinforcement-learning +2

107

Paper
Code

IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

no code implementations • 13 Nov 2019 • Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox

For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task.

Robot Manipulation

Paper
Add Code

Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity

no code implementations • 11 Nov 2019 • Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

We evaluate the quality of our platform, the diversity of demonstrations in our dataset, and the utility of our dataset via quantitative and qualitative analysis.

Robot Manipulation

Paper
Add Code

Interactive Gibson Benchmark (iGibson 0.5): A Benchmark for Interactive Navigation in Cluttered Environments

1 code implementation • 30 Oct 2019 • Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese

We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task.

Robot Navigation

606

Paper
Code

Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation

no code implementations • 29 Oct 2019 • Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal.

Variational Inference

Paper
Add Code

KETO: Learning Keypoint Representations for Tool Manipulation

no code implementations • 26 Oct 2019 • Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei, Silvio Savarese

For this purpose, we present KETO, a framework of learning keypoint representations of tool-based manipulation.

Robotics

Paper
Add Code

JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments

1 code implementation • 25 Oct 2019 • Roberto Martín-Martín, Mihir Patel, Hamid Rezatofighi, Abhijeet Shenoi, JunYoung Gwak, Eric Frankel, Amir Sadeghian, Silvio Savarese

We present JRDB, a novel egocentric dataset collected from our social mobile manipulator JackRabbot.

Autonomous Navigation Human Detection

142

Paper
Code

HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators

1 code implementation • 24 Oct 2019 • Chengshu Li, Fei Xia, Roberto Martin-Martin, Silvio Savarese

Different from other HRL solutions, HRL4IN handles the heterogeneous nature of the Interactive Navigation task by creating subgoals in different spaces in different phases of the task.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Code

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints

2 code implementations • 23 Oct 2019 • Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu

We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data.

Ranked #1 on 6D Pose Estimation using RGBD on REAL275 (Rerr metric)

6D Pose Estimation 6D Pose Estimation using RGBD +2

286

Paper
Code

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

1 code implementation • ICCV 2019 • Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese

Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e. g., class, material, and other attributes), rooms (e. g., scene category, volume, etc.)

212

Paper
Code

Causal Induction from Visual Observations for Goal Directed Tasks

2 code implementations • 3 Oct 2019 • Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world.

Paper
Code

Regression Planning Networks

1 code implementation • NeurIPS 2019 • Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Recent learning-to-plan methods have shown promising results on planning directly from observation space.

regression

Paper
Code

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning

no code implementations • 27 Sep 2019 • Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei

We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL).

OpenAI Gym reinforcement-learning +2

Paper
Add Code

AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers

1 code implementation • 9 Sep 2019 • Andrey Kurenkov, Ajay Mandlekar, Roberto Martin-Martin, Silvio Savarese, Animesh Garg

The exploration mechanism used by a Deep Reinforcement Learning (RL) agent plays a key role in determining its sample efficiency.

Reinforcement Learning (RL)

Paper
Code

Situational Fusion of Visual Representation for Visual Navigation

no code implementations • ICCV 2019 • Bokui Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese

A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities.

Visual Navigation

Paper
Add Code

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

no code implementations • 16 Aug 2019 • De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures.

Imitation Learning

Paper
Add Code

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

1 code implementation • 28 Jul 2019 • Michelle A. Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback.

Representation Learning Self-Supervised Learning

Paper
Code

Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational Interactants

1 code implementation • 24 Jul 2019 • Mason Swofford, John Charles Peruzzi, Nathan Tsoi, Sydney Thompson, Roberto Martín-Martín, Silvio Savarese, Marynel Vázquez

We propose a data-driven approach to detect conversational groups by identifying spatial arrangements typical of these focused social encounters.

Clustering Graph Clustering +1

Paper
Code

Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

no code implementations • NeurIPS 2019 • Vineet Kosaraju, Amir Sadeghian, Roberto Martín-Martín, Ian Reid, S. Hamid Rezatofighi, Silvio Savarese

This problem is compounded by the presence of social interactions between humans and their physical interactions with the scene.

Ranked #16 on Trajectory Prediction on ETH/UCY

Autonomous Vehicles Decoder +3

Paper
Add Code

Time-Varying Interaction Estimation Using Ensemble Methods

no code implementations • 25 Jun 2019 • Brandon Oselio, Amir Sadeghian, Silvio Savarese, Alfred Hero

Directed information (DI) is a useful tool to explore time-directed interactions in multivariate data.

Ensemble Learning

Paper
Add Code

Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks

no code implementations • 20 Jun 2019 • Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg

This paper studies the effect of different action spaces in deep RL and advocates for Variable Impedance Control in End-effector Space (VICES) as an advantageous action space for constrained and contact-rich tasks.

Reinforcement Learning (RL)

Paper
Add Code

Which Tasks Should Be Learned Together in Multi-task Learning?

1 code implementation • ICML 2020 • Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

Many computer vision applications require solving multiple tasks in real-time.

Multi-Task Learning

Paper
Code

Deep Local Trajectory Replanning and Control for Robot Navigation

no code implementations • 13 May 2019 • Ashwini Pokle, Roberto Martín-Martín, Patrick Goebel, Vincent Chow, Hans M. Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, Marynel Vázquez

We present a navigation system that combines ideas from hierarchical planning and machine learning.

Robot Navigation

Paper
Add Code

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

7 code implementations • CVPR 2019 • Christopher Choy, JunYoung Gwak, Silvio Savarese

To overcome challenges in the 4D space, we propose the hybrid kernel, a special case of the generalized sparse convolution, and the trilateral-stationary conditional random field that enforces spatio-temporal consistency in the 7D space-time-chroma space.

Ranked #1 on Robust 3D Semantic Segmentation on WOD-C

4D Spatio Temporal Semantic Segmentation Robust 3D Semantic Segmentation

2,306

Paper
Code

Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera

no code implementations • 1 Apr 2019 • Jingfan Wang, Lyne P. Tchapmi, Arvind P. Ravikumara, Mike McGuire, Clay S. Bell, Daniel Zimmerle, Silvio Savarese, Adam R. Brandt

We find that the detection accuracy can reach as high as 99%, the overall detection accuracy can exceed 95% for a case across all leak sizes and imaging distances.

Change Detection Optical Flow Estimation

Paper
Add Code

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

no code implementations • CVPR 2019 • Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese

Many robotic applications require the agent to perform long-horizon tasks in partially observable environments.

Decision Making reinforcement-learning +2

Paper
Add Code

Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter

no code implementations • 4 Mar 2019 • Michael Danielczuk, Andrey Kurenkov, Ashwin Balakrishna, Matthew Matl, David Wang, Roberto Martín-Martín, Animesh Garg, Silvio Savarese, Ken Goldberg

In this paper, we formalize Mechanical Search and study a version where distractor objects are heaped over the target object in a bin.

Robotics

Paper
Add Code

A Behavioral Approach to Visual Navigation with Graph Localization Networks

no code implementations • 1 Mar 2019 • Kevin Chen, Juan Pablo de Vicente, Gabriel Sepulveda, Fei Xia, Alvaro Soto, Marynel Vazquez, Silvio Savarese

Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps.

Navigate Visual Navigation

Paper
Add Code

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

10 code implementations • CVPR 2019 • Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese

By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.

Object object-detection +2

12,120

Paper
Code

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

8 code implementations • CVPR 2019 • Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese

A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources.

Ranked #4 on 6D Pose Estimation on LineMOD

6D Pose Estimation 6D Pose Estimation using RGBD +1

1,041

Paper
Code

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

1 code implementation • 31 Dec 2018 • Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik

This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images.

Object Detection

107

Paper
Code

Coupled Recurrent Network (CRN)

no code implementations • 25 Dec 2018 • Lin Sun, Kui Jia, Yuejia Shen, Silvio Savarese, Dit Yan Yeung, Bertram E. Shi

To learn from these heterogenous input sources, existing methods reply on two-stream architectural designs that contain independent, parallel streams of Recurrent Neural Networks (RNNs).

Action Recognition In Videos Multi-Person Pose Estimation +2

Paper
Add Code

RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

no code implementations • 7 Nov 2018 • Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei

Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification.

Imitation Learning

Paper
Add Code

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

2 code implementations • 24 Oct 2018 • Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback.

Reinforcement Learning (RL) Self-Supervised Learning

Paper
Code

Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation

no code implementations • EMNLP 2018 • Xiaoxue Zang, Ashwini Pokle, Marynel Vázquez, Kevin Chen, Juan Carlos Niebles, Alvaro Soto, Silvio Savarese

We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation.

Robot Navigation Translation

Paper
Add Code

Gibson Env: Real-World Perception for Embodied Agents

5 code implementations • CVPR 2018 • Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly.

Domain Adaptation General Reinforcement Learning +1

826

Paper
Code

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

no code implementations • CVPR 2019 • De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles

We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model.

Paper
Add Code

Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

no code implementations • 25 Jun 2018 • Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, Silvio Savarese

We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering.

Paper
Add Code

VUNet: Dynamic Scene View Synthesis for Traversability Estimation using an RGB Camera

no code implementations • 22 Jun 2018 • Noriaki Hirose, Amir Sadeghian, Fei Xia, Roberto Martin-Martin, Silvio Savarese

We present VUNet, a novel view(VU) synthesis method for mobile robots in dynamic environments, and its application to the estimation of future traversability.

Autonomous Vehicles

Paper
Add Code

SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

1 code implementation • CVPR 2019 • Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, S. Hamid Rezatofighi, Silvio Savarese

Whereas, the social attention component aggregates information across the different agent interactions and extracts the most important trajectory information from the surrounding neighbors.

Ranked #4 on Trajectory Prediction on Stanford Drone (ADE (8/12) @K=5 metric)

Generative Adversarial Network Self-Driving Cars +1

Paper
Code

Demo2Vec: Reasoning Object Affordances From Online Videos

no code implementations • CVPR 2018 • Kuan Fang, Te-Lin Wu, Daniel Yang, Silvio Savarese, Joseph J. Lim

Watching expert demonstrations is an important way for humans and robots to reason about affordances of unseen objects.

Ranked #2 on Video-to-image Affordance Grounding on OPRA (28x28)

Object Video-to-image Affordance Grounding

Paper
Add Code

Im2Pano3D: Extrapolating 360Â° Structure and Semantics Beyond the Field of View

no code implementations • CVPR 2018 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation ( <=50%) in the form of an RGB-D image.

Paper
Add Code

Generalizing to Unseen Domains via Adversarial Data Augmentation

2 code implementations • NeurIPS 2018 • Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John Duchi, Vittorio Murino, Silvio Savarese

Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is "hard" under the current model.

Data Augmentation Semantic Segmentation

141

Paper
Code

Deep Learning under Privileged Information Using Heteroscedastic Dropout

1 code implementation • CVPR 2018 • John Lambert, Ozan Sener, Silvio Savarese

This is what the Learning Under Privileged Information (LUPI) paradigm endeavors to model by utilizing extra knowledge only available during training.

Image Classification Machine Translation +1

Paper
Code

Taskonomy: Disentangling Task Transfer Learning

1 code implementation • CVPR 2018 • Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

The product is a computational taxonomic map for task transfer learning.

Multi-Task Learning

831

Paper
Code

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

7 code implementations • CVPR 2018 • Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi

Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments.

Ranked #4 on Trajectory Prediction on ETH

Collision Avoidance Motion Forecasting +4

796

Paper
Code

Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings

2 code implementations • 22 Mar 2018 • Kevin Chen, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas Funkhouser, Silvio Savarese

To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes.

Metric Learning Retrieval

Paper
Code

GONet: A Semi-Supervised Deep Learning Approach For Traversability Estimation

no code implementations • 8 Mar 2018 • Noriaki Hirose, Amir Sadeghian, Marynel Vázquez, Patrick Goebel, Silvio Savarese

We present semi-supervised deep learning approaches for traversability estimation from fisheye images.

Paper
Add Code

Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

no code implementations • 12 Dec 2017 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

Paper
Add Code

CAR-Net: Clairvoyant Attentive Recurrent Network

no code implementations • ECCV 2018 • Amir Sadeghian, Ferdinand Legros, Maxime Voisin, Ricky Vesel, Alexandre Alahi, Silvio Savarese

We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-view image of the navigation scene.

Trajectory Forecasting

Paper
Add Code

Adversarial Feature Augmentation for Unsupervised Domain Adaptation

2 code implementations • CVPR 2018 • Riccardo Volpi, Pietro Morerio, Silvio Savarese, Vittorio Murino

Recent works showed that Generative Adversarial Networks (GANs) can be successfully applied in unsupervised domain adaptation, where, given a labeled source dataset and an unlabeled target dataset, the goal is to train powerful classifiers for the target samples.

Data Augmentation Unsupervised Domain Adaptation

127

Paper
Code

Recurrent Autoregressive Networks for Online Multi-Object Tracking

no code implementations • 7 Nov 2017 • Kuan Fang, Yu Xiang, Xiaocheng Li, Silvio Savarese

The external memory explicitly stores previous inputs of each trajectory in a time window, while the internal memory learns to summarize long-term tracking history and associate detections by processing the external memory.

Multi-Object Tracking Object +1

Paper
Add Code

Generic 3D Representation via Pose Estimation and Matching

1 code implementation • 23 Oct 2017 • Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited.

Object Pose Estimation +1

423

Paper
Code

SEGCloud: Semantic Segmentation of 3D Point Clouds

no code implementations • 20 Oct 2017 • Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, Silvio Savarese

Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation.

Ranked #13 on Semantic Segmentation on Semantic3D

Paper
Add Code

Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas

We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.

3D Part Segmentation 3D Reconstruction +1

1,994

Paper
Code

Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation

no code implementations • 17 Oct 2017 • Kuan Fang, Yunfei Bai, Stefan Hinterstoisser, Silvio Savarese, Mrinal Kalakrishnan

Learning-based approaches to robotic manipulation are limited by the scalability of data collection and accessibility of labels.

Domain Adaptation Instance Segmentation +2

Paper
Add Code

Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

1 code implementation • 4 Oct 2017 • Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese

In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction.

Few-Shot Learning Program induction +1

Paper
Code

To Go or Not To Go? A Near Unsupervised Learning Approach For Robot Navigation

no code implementations • 16 Sep 2017 • Noriaki Hirose, Amir Sadeghian, Patrick Goebel, Silvio Savarese

It is important for robots to be able to decide whether they can go through a space or not, as they navigate through a dynamic environment.

Anomaly Detection Navigate +1

Paper
Add Code

Lattice Long Short-Term Memory for Human Action Recognition

no code implementations • ICCV 2017 • Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese

This method effectively enhances the ability to model dynamics across time and addresses the non-stationary issue of long-term motion dynamics without significantly increasing the model complexity.

Action Recognition Optical Flow Estimation +1

Paper
Add Code

DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image

no code implementations • 11 Aug 2017 • Andrey Kurenkov, Jingwei Ji, Animesh Garg, Viraj Mehta, JunYoung Gwak, Christopher Choy, Silvio Savarese

We evaluate our approach on the ShapeNet dataset and show that - (a) the Free-Form Deformation layer is a powerful new building block for Deep Learning models that manipulate 3D data (b) DeformNet uses this FFD layer combined with shape retrieval for smooth and detail-preserving 3D reconstruction of qualitatively plausible point clouds with respect to a single query image (c) compared to other state-of-the-art 3D reconstruction methods, DeformNet quantitatively matches or outperforms their benchmarks by significant margins.

3D Reconstruction 3D Shape Reconstruction +1

Paper
Add Code

Active Learning for Convolutional Neural Networks: A Core-Set Approach

10 code implementations • ICLR 2018 • Ozan Sener, Silvio Savarese

active learning).

Ranked #5 on Active Learning on CIFAR10 (10,000)

Active Learning Image Classification

521

Paper
Code

Weakly supervised 3D Reconstruction with Adversarial Constraint

2 code implementations • 31 May 2017 • JunYoung Gwak, Christopher B. Choy, Animesh Garg, Manmohan Chandraker, Silvio Savarese

Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks.

3D Reconstruction

1,329

Paper
Code

Deep View Morphing

no code implementations • CVPR 2017 • Dinghuang Ji, Junghyun Kwon, Max McFarland, Silvio Savarese

An encoder-decoder network then generates dense correspondences between the rectified images and blending masks to predict the visibility of pixels of the rectified images in the middle view.

Decoder

Paper
Add Code

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

3 code implementations • 3 Feb 2017 • Iro Armeni, Sasha Sax, Amir R. Zamir, Silvio Savarese

We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2. 5D and 3D domains, with instance-level semantic and geometric annotations.

Scene Understanding

455

Paper
Code

Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies

no code implementations • ICCV 2017 • Amir Sadeghian, Alexandre Alahi, Silvio Savarese

To address this challenge, we present a structure of Recurrent Neural Networks (RNN) that jointly reasons on multiple cues over a temporal window.

Paper
Add Code

Feedback Networks

1 code implementation • CVPR 2017 • Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, Silvio Savarese

Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer.

Paper
Code

Learning Transferrable Representations for Unsupervised Domain Adaptation

no code implementations • NeurIPS 2016 • Ozan Sener, Hyun Oh Song, Ashutosh Saxena, Silvio Savarese

Supervised learning with large scale labelled datasets and deep layered models has caused a paradigm shift in diverse areas in learning and recognition.

Object Recognition Unsupervised Domain Adaptation

Paper
Add Code

Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition

no code implementations • CVPR 2017 • Timur Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, Silvio Savarese

We present a unified framework for understanding human social behaviors in raw image sequences.

Ranked #2 on Action Recognition on Volleyball

Action Localization Action Recognition +1

Paper
Add Code

Human Centred Object Co-Segmentation

no code implementations • 12 Jun 2016 • Chenxia Wu, Jiemi Zhang, Ashutosh Saxena, Silvio Savarese

Co-segmentation is the automatic extraction of the common semantic regions given a set of images.

Human-Object Interaction Detection Object +1

Paper
Add Code

Universal Correspondence Network

no code implementations • NeurIPS 2016 • Christopher B. Choy, JunYoung Gwak, Silvio Savarese, Manmohan Chandraker

We present a deep learning framework for accurate visual correspondences and demonstrate its effectiveness for both geometric and semantic matching, spanning across rigid motions to intra-class shape or appearance variations.

Metric Learning Semantic Similarity +1

Paper
Add Code

DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes

no code implementations • CVPR 2016 • Saumitro Dasgupta, Kuan Fang, Kevin Chen, Silvio Savarese

We consider the problem of estimating the spatial layout of an indoor scene from a monocular RGB image, modeled as the projection of a 3D cuboid.

Paper
Add Code

Social LSTM: Human Trajectory Prediction in Crowded Spaces

no code implementations • CVPR 2016 • Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, Silvio Savarese

Different from the conventional LSTM, we share the information between multiple LSTMs through a new pooling layer.

Ranked #1 on Trajectory Prediction on Stanford Drone (FDE(8/12) @K=5 metric)

Collision Avoidance Navigate +1

Paper
Add Code

3D Semantic Parsing of Large-Scale Indoor Spaces

no code implementations • CVPR 2016 • Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, Silvio Savarese

In this paper, we propose a method for semantic parsing the 3D point cloud of an entire building using a hierarchical approach: first, the raw data is parsed into semantically meaningful spaces (e. g. rooms, etc) that are aligned into a canonical reference coordinate system.

Semantic Parsing

Paper
Add Code

Unsupervised Semantic Action Discovery from Video Collections

no code implementations • 11 May 2016 • Ozan Sener, Amir Roshan Zamir, Chenxia Wu, Silvio Savarese, Ashutosh Saxena

Our method can also provide a textual description for each of the identified semantic steps and video segments.

Paper
Add Code

Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection

1 code implementation • 16 Apr 2016 • Yu Xiang, Wongun Choi, Yuanqing Lin, Silvio Savarese

In CNN-based object detection methods, region proposal becomes a bottleneck when objects exhibit significant scale variation, occlusion or truncation.

Ranked #4 on Vehicle Pose Estimation on KITTI Cars Hard

General Classification Object +4

361

Paper
Code

Learning to Track at 100 FPS with Deep Regression Networks

3 code implementations • 6 Apr 2016 • David Held, Sebastian Thrun, Silvio Savarese

We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps.

regression

876

Paper
Code

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

14 code implementations • 2 Apr 2016 • Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, Silvio Savarese

Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2).

Ranked #4 on 3D Reconstruction on Data3D−R2N2

3D Object Reconstruction 3D Reconstruction +1

1,329

Paper
Code

Knowledge Transfer for Scene-specific Motion Prediction

no code implementations • 22 Mar 2016 • Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, Silvio Savarese

When given a single frame of the video, humans can not only interpret the content of the scene, but also they are able to forecast the near future.

motion prediction Trajectory Prediction +1

Paper
Add Code

Watch-n-Patch: Unsupervised Learning of Actions and Relations

no code implementations • 11 Mar 2016 • Chenxia Wu, Jiemi Zhang, Ozan Sener, Bart Selman, Silvio Savarese, Ashutosh Saxena

For evaluation, we contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacting with different objects.

Action Segmentation Clustering

Paper
Add Code

Unsupervised Transductive Domain Adaptation

no code implementations • 10 Feb 2016 • Ozan Sener, Hyun Oh Song, Ashutosh Saxena, Silvio Savarese

We incorporate the domain shift and the transductive target inference into our framework by jointly solving for an asymmetric similarity metric and the optimal transductive target label assignment.

Object Recognition Unsupervised Domain Adaptation

Paper
Add Code

Forecasting Social Navigation in Crowded Complex Scenes

no code implementations • 5 Jan 2016 • Alexandre Robicquet, Alexandre Alahi, Amir Sadeghian, Bryan Anenberg, John Doherty, Eli Wu, Silvio Savarese

We present an extensive evaluation where different methods for trajectory forecasting are evaluated and compared.

Common Sense Reasoning Navigate +2

Paper
Add Code

Watch-Bot: Unsupervised Learning for Reminding Humans of Forgotten Actions

no code implementations • 14 Dec 2015 • Chenxia Wu, Jiemi Zhang, Bart Selman, Silvio Savarese, Ashutosh Saxena

We show that our approach not only improves the unsupervised action segmentation and action cluster assignment performance, but also effectively detects the forgotten actions on a challenging human activity RGB-D video dataset.

Action Segmentation Object

Paper
Add Code

ShapeNet: An Information-Rich 3D Model Repository

15 code implementations • 9 Dec 2015 • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu

We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects.

Data Visualization

65,339

Paper
Code

Learning to Track: Online Multi-Object Tracking by Decision Making

no code implementations • ICCV 2015 • Yu Xiang, Alexandre Alahi, Silvio Savarese

Online Multi-Object Tracking (MOT) has wide applications in time-critical video analysis scenarios, such as robot navigation and autonomous driving.

Ranked #19 on Multiple Object Tracking on KITTI Tracking test

Autonomous Driving Decision Making +5

Paper
Add Code

Deep Metric Learning via Lifted Structured Feature Embedding

3 code implementations • CVPR 2016 • Hyun Oh Song, Yu Xiang, Stefanie Jegelka, Silvio Savarese

Additionally, we collected Online Products dataset: 120k images of 23k classes of online products for metric learning.

Metric Learning Structured Prediction

3,961

Paper
Code

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

2 code implementations • CVPR 2016 • Ashesh Jain, Amir R. Zamir, Silvio Savarese, Ashutosh Saxena

The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps.

Ranked #4 on Skeleton Based Action Recognition on CAD-120

Human Pose Forecasting Skeleton Based Action Recognition

254

Paper
Code

Semantic Cross-View Matching

no code implementations • 31 Oct 2015 • Francesco Castaldo, Amir Zamir, Roland Angst, Francesco Palmieri, Silvio Savarese

In this paper, we therefore explore this idea and propose an automatic method for detecting and representing the semantic information of an RGB image with the goal of performing cross-view matching with a (non-RGB) geographic information system (GIS).

Paper
Add Code

Action Recognition by Hierarchical Mid-level Action Elements

no code implementations • ICCV 2015 • Tian Lan, Yuke Zhu, Amir Roshan Zamir, Silvio Savarese

Realistic videos of human actions exhibit rich spatiotemporal structures at multiple levels of granularity: an action can always be decomposed into multiple finer-grained elements in both space and time.

Action Parsing Action Recognition +2

Paper
Add Code

Deep Learning for Single-View Instance Recognition

no code implementations • 29 Jul 2015 • David Held, Sebastian Thrun, Silvio Savarese

We show that feedforward neural networks outperform state-of-the-art methods for recognizing objects from novel viewpoints even when trained from just a single image per object.

Paper
Add Code

Unsupervised Semantic Parsing of Video Collections

no code implementations • ICCV 2015 • Ozan Sener, Amir Zamir, Silvio Savarese, Ashutosh Saxena

The proposed method is capable of providing a semantic "storyline" of the video composed of its objective steps.

Unsupervised semantic parsing

Paper
Add Code

Data-Driven 3D Voxel Patterns for Object Category Recognition

no code implementations • CVPR 2015 • Yu Xiang, Wongun Choi, Yuanqing Lin, Silvio Savarese

Despite the great progress achieved in recognizing objects as 2D bounding boxes in images, it is still very challenging to detect occluded objects and estimate the 3D properties of multiple objects from a single image.

Object Object Recognition +1

Paper
Add Code

Enriching Object Detection With 2D-3D Registration and Continuous Viewpoint Estimation

no code implementations • CVPR 2015 • Christopher Bongsoo Choy, Michael Stark, Sam Corbett-Davies, Silvio Savarese

We propose an efficient method for synthesizing templates from 3D models that runs on the fly -- that is, it quickly produces detectors for an arbitrary viewpoint of a 3D model without expensive dataset-dependent training or template storage.

object-detection Object Detection +2

Paper
Add Code

Watch-n-Patch: Unsupervised Understanding of Actions and Relations

no code implementations • CVPR 2015 • Chenxia Wu, Jiemi Zhang, Silvio Savarese, Ashutosh Saxena

For evaluation, we also contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacted with different objects.

Action Segmentation

Paper
Add Code

A Coarse-to-Fine Model for 3D Pose Estimation and Sub-category Recognition

no code implementations • CVPR 2015 • Roozbeh Mottaghi, Yu Xiang, Silvio Savarese

Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters.

3D Pose Estimation Object +2

Paper
Add Code

Learning an Image-based Motion Context for Multiple People Tracking

no code implementations • CVPR 2014 • Laura Leal-Taixe, Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, Silvio Savarese

We present a novel method for multiple people tracking that leverages a generalized model for capturing interactions among individuals.

Multiple People Tracking

Paper
Add Code

Shrinkage Optimized Directed Information using Pictorial Structures for Action Recognition

no code implementations • 12 Apr 2014 • Xu Chen, Alfred Hero, Silvio Savarese

In this paper, we propose a novel action recognition framework.

Action Recognition Temporal Action Localization

Paper
Add Code

Understanding Indoor Scenes Using 3D Geometric Phrases

no code implementations • CVPR 2013 • Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification.

Ranked #7 on Room Layout Estimation on SUN RGB-D

General Classification Object +5

Paper
Add Code

Dense Object Reconstruction with Semantic Priors

no code implementations • CVPR 2013 • Sid Yingze Bao, Manmohan Chandraker, Yuanqing Lin, Silvio Savarese

Given multiple images of an unseen instance, we collate information from 2D object detectors to align the structure from motion point cloud with the mean shape, which is subsequently warped and refined to approach the actual shape.

Object object-detection +2

Paper
Add Code

Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

no code implementations • CVPR 2013 • Roni Mittelman, Honglak Lee, Benjamin Kuipers, Silvio Savarese

In order to address this issue, we propose a weakly supervised approach to learn mid-level features, where only class-level supervision is provided during training.

Object Recognition Weakly-supervised Learning

Paper
Add Code

Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

no code implementations • CVPR 2013 • Byung-soo Kim, Shili Xu, Silvio Savarese

In this paper we focus on the problem of detecting objects in 3D from RGB-D images.

Object Object Recognition

Paper
Add Code

Learning Hierarchical Linguistic Descriptions of Visual Datasets

no code implementations • WS 2013 • Roni Mittelman, Min Sun, Benjamin Kuipers, Silvio Savarese

Image Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.