Search Results for author: Li Fei-Fei

Found 193 papers, 70 papers with code

Exploring Functional Connectivities of the Human Brain using Multivariate Information Analysis

no code implementations NeurIPS 2009 Barry Chai, Dirk Walther, Diane Beck, Li Fei-Fei

In this study, we present a method for estimating the mutual information for a localized pattern of fMRI data.

Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions

no code implementations NeurIPS 2009 Bangpeng Yao, Dirk Walther, Diane Beck, Li Fei-Fei

In this paper, we propose to model such connections in an Hidden Conditional Random Field (HCRF) framework, where the classifier of one region of interest (ROI) makes predictions based on not only its voxels but also the classifier predictions from ROIs that it connects to.

Classification General Classification

Large Margin Learning of Upstream Scene Understanding Models

no code implementations NeurIPS 2010 Jun Zhu, Li-Jia Li, Li Fei-Fei, Eric P. Xing

This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced.

General Classification Scene Classification +2

Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

no code implementations NeurIPS 2010 Li-Jia Li, Hao Su, Li Fei-Fei, Eric P. Xing

Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings.

General Classification Object +2

Discriminative Segment Annotation in Weakly Labeled Video

no code implementations CVPR 2013 Kevin Tang, Rahul Sukthankar, Jay Yagnik, Li Fei-Fei

Second, we ensure that CRANE is robust to label noise, both in terms of tagged videos that fail to contain the concept as well as occasional negative videos that do.

Fine-Grained Crowdsourcing for Fine-Grained Recognition

no code implementations CVPR 2013 Jia Deng, Jonathan Krause, Li Fei-Fei

In this work, we include humans in the loop to help computers select discriminative features.

feature selection

Socially-aware Large-scale Crowd Forecasting

no code implementations CVPR 2014 Alexandre Alahi, Vignesh Ramanathan, Li Fei-Fei

In crowded spaces such as city centers or train stations, human mobility looks complex, but is often influenced only by a few causes.

Large-Scale Video Classification with Convolutional Neural Networks

1 code implementation 2014 IEEE Conference on Computer Vision and Pattern Recognition 2014 Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei

We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63. 3% up from 43. 9%).

Action Recognition Classification +3

VideoSET: Video Summary Evaluation through Text

no code implementations23 Jun 2014 Serena Yeung, Alireza Fathi, Li Fei-Fei

In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video.

ImageNet Large Scale Visual Recognition Challenge

12 code implementations1 Sep 2014 Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images.

General Classification Image Classification +4

Affordances Provide a Fundamental Categorization Principle for Visual Scenes

no code implementations19 Nov 2014 Michelle R. Greene, Christopher Baldassano, Andre Esteva, Diane M. Beck, Li Fei-Fei

Traditional models of visual perception posit that scene categorization is achieved through the recognition of a scene's objects, yet these models cannot account for the mounting evidence that human observers are relatively insensitive to the local details in an image.

Visual Noise from Natural Scene Statistics Reveals Human Scene Category Representations

no code implementations19 Nov 2014 Michelle R. Greene, Abraham P. Botros, Diane M. Beck, Li Fei-Fei

In this work, we visualize observers' internal representations of a visual scene category (street) using an experiment in which the observer views the naturalistic visual noise and collaborates with the algorithm to externalize his internal representation.

Improving Image Classification with Location Context

no code implementations ICCV 2015 Kevin Tang, Manohar Paluri, Li Fei-Fei, Rob Fergus, Lubomir Bourdev

With the widespread availability of cellphones and cameras that have GPS capabilities, it is common for images being uploaded to the Internet today to have GPS coordinates associated with them.

Classification General Classification +1

Image Retrieval Using Scene Graphs

no code implementations CVPR 2015 Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, Li Fei-Fei

We introduce a novel dataset of 5, 000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval.

Image Retrieval Object Localization +1

Fine-Grained Recognition Without Part Annotations

no code implementations CVPR 2015 Jonathan Krause, Hailin Jin, Jianchao Yang, Li Fei-Fei

Scaling up fine-grained recognition to all domains of fine-grained objects is a challenge the computer vision community will need to face in order to realize its goal of recognizing all object categories.

Best of Both Worlds: Human-Machine Collaboration for Object Annotation

no code implementations CVPR 2015 Olga Russakovsky, Li-Jia Li, Li Fei-Fei

This paper brings together the latest advancements in object detection and in crowd engineering into a principled framework for accurately and efficiently localizing objects in images.

Object object-detection +1

Visualizing and Understanding Recurrent Networks

3 code implementations5 Jun 2015 Andrej Karpathy, Justin Johnson, Li Fei-Fei

Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data.

What's the Point: Semantic Segmentation with Point Supervision

1 code implementation6 Jun 2015 Amy Bearman, Olga Russakovsky, Vittorio Ferrari, Li Fei-Fei

The semantic image segmentation task presents a trade-off between test time accuracy and training-time annotation cost.

Image Segmentation Semantic Segmentation

Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries

no code implementations20 Jul 2015 Yuke Zhu, Ce Zhang, Christopher Ré, Li Fei-Fei

The complexity of the visual world creates significant challenges for comprehensive visual understanding.

Retrieval

Detecting events and key actors in multi-person videos

no code implementations CVPR 2016 Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei

In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event.

Event Detection General Classification

The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

1 code implementation20 Nov 2015 Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.

Ranked #5 on Fine-Grained Image Classification on CUB-200-2011 (using extra training data)

Active Learning Fine-Grained Image Classification

End-to-end Learning of Action Detection from Frame Glimpses in Videos

1 code implementation CVPR 2016 Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei

In this work we introduce a fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions.

Ranked #9 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.2 metric)

Action Detection Temporal Action Localization

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

1 code implementation CVPR 2016 Justin Johnson, Andrej Karpathy, Li Fei-Fei

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language.

Dense Captioning Image Captioning +4

RGB-W: When Vision Meets Wireless

no code implementations ICCV 2015 Alexandre Alahi, Albert Haque, Li Fei-Fei

Inspired by the recent success of RGB-D cameras, we propose the enrichment of RGB data with an additional "quasi-free" modality, namely, the wireless signal (e. g., wifi or Bluetooth) emitted by individuals' cell phones, referred to as RGB-W.

Embracing Error to Enable Rapid Crowdsourcing

no code implementations14 Feb 2016 Ranjay Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data.

General Classification Sentiment Analysis +2

Locally-Optimized Inter-Subject Alignment of Functional Cortical Regions

no code implementations7 Jun 2016 Marius Cătălin Iordan, Armand Joulin, Diane M. Beck, Li Fei-Fei

Our method outperforms the two most commonly used alternatives (anatomical landmark-based AFNI alignment and cortical convexity-based FreeSurfer alignment) in overlap between predicted region and functionally-defined LOC.

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

no code implementations28 Jul 2016 De-An Huang, Li Fei-Fei, Juan Carlos Niebles

We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time.

General Classification

Visual Relationship Detection with Language Priors

no code implementations31 Jul 2016 Cewu Lu, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.

Content-Based Image Retrieval Relationship Detection +3

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

no code implementations15 Sep 2016 Kenji Hata, Ranjay Krishna, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets.

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

2 code implementations16 Sep 2016 Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.

3D Reconstruction Feature Engineering +3

Crowdsourcing in Computer Vision

no code implementations7 Nov 2016 Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, Kristen Grauman

Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts.

Object Recognition

A Hierarchical Approach for Generating Descriptive Image Paragraphs

3 code implementations CVPR 2017 Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

Dense Captioning Descriptive +3

Recurrent Attention Models for Depth-Based Person Identification

no code implementations CVPR 2016 Albert Haque, Alexandre Alahi, Li Fei-Fei

We present an attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information, hence in the dark.

Person Identification reinforcement-learning +1

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

5 code implementations CVPR 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.

Question Answering Visual Question Answering +1

Unsupervised Learning of Long-Term Motion Dynamics for Videos

no code implementations CVPR 2017 Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos.

Representation Learning

Scene Graph Generation by Iterative Message Passing

5 code implementations CVPR 2017 Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei

In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image.

Graph Generation Panoptic Scene Graph Generation

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

no code implementations22 Feb 2017 Timnit Gebru, Jonathan Krause, Yi-Lun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, Li Fei-Fei

The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors.

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

no code implementations CVPR 2017 De-An Huang, Joseph J. Lim, Li Fei-Fei, Juan Carlos Niebles

We propose an unsupervised method for reference resolution in instructional videos, where the goal is to temporally link an entity (e. g., "dressing") to the action (e. g., "mix yogurt") that produced it.

Referring Expression

Inferring and Executing Programs for Visual Reasoning

5 code implementations ICCV 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.

Visual Question Answering (VQA) Visual Reasoning

Tackling Over-pruning in Variational Autoencoders

no code implementations9 Jun 2017 Serena Yeung, Anitha Kannan, Yann Dauphin, Li Fei-Fei

The so-called epitomes of this model are groups of mutually exclusive latent factors that compete to explain the data.

Learning to Learn from Noisy Web Videos

no code implementations CVPR 2017 Serena Yeung, Vignesh Ramanathan, Olga Russakovsky, Liyue Shen, Greg Mori, Li Fei-Fei

Our method uses Q-learning to learn a data labeling policy on a small labeled training dataset, and then uses this to automatically label noisy web data for new visual concepts.

Action Recognition Q-Learning +1

Scalable Annotation of Fine-Grained Categories Without Experts

no code implementations7 Sep 2017 Timnit Gebru, Jonathan Krause, Jia Deng, Li Fei-Fei

We present a crowdsourcing workflow to collect image annotations for visually similar synthetic categories without requiring experts.

Fine-Grained Car Detection for Visual Census Estimation

no code implementations7 Sep 2017 Timnit Gebru, Jonathan Krause, Yi-Lun Wang, Duyun Chen, Jia Deng, Li Fei-Fei

In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data.

Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach

no code implementations ICCV 2017 Timnit Gebru, Judy Hoffman, Li Fei-Fei

While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild.

Attribute Domain Adaptation +1

Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

1 code implementation4 Oct 2017 Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese

In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction.

Few-Shot Learning Program induction +1

Thoracic Disease Identification and Localization with Limited Supervision

1 code implementation CVPR 2018 Zhe Li, Chong Wang, Mei Han, Yuan Xue, Wei Wei, Li-Jia Li, Li Fei-Fei

Accurate identification and localization of abnormalities from radiology images play an integral part in clinical diagnosis and treatment planning.

General Classification

Graph Distillation for Action Detection with Privileged Modalities

1 code implementation ECCV 2018 Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei

We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available.

Action Classification Action Detection +1

Progressive Neural Architecture Search

17 code implementations ECCV 2018 Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.

Evolutionary Algorithms General Classification +3

Emergence of Structured Behaviors from Curiosity-Based Intrinsic Motivation

no code implementations21 Feb 2018 Nick Haber, Damian Mrowca, Li Fei-Fei, Daniel L. K. Yamins

Moreover, the world model that the agent learns supports improved performance on object dynamics prediction and localization tasks.

motion prediction Object

Learning to Play with Intrinsically-Motivated Self-Aware Agents

no code implementations21 Feb 2018 Nick Haber, Damian Mrowca, Li Fei-Fei, Daniel L. K. Yamins

We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering.

motion prediction Object

Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks

2 code implementations24 Feb 2018 Amy Jin, Serena Yeung, Jeffrey Jopling, Jonathan Krause, Dan Azagury, Arnold Milstein, Li Fei-Fei

We show that our method both effectively detects the spatial bounds of tools as well as significantly outperforms existing methods on tool presence detection.

Referring Relationships

2 code implementations CVPR 2018 Ranjay Krishna, Ines Chami, Michael Bernstein, Li Fei-Fei

We formulate the cyclic condition between the entities in a relationship by modelling predicates that connect the entities as shifts in attention from one entity to another.

Iterative Visual Reasoning Beyond Convolutions

no code implementations CVPR 2018 Xinlei Chen, Li-Jia Li, Li Fei-Fei, Abhinav Gupta

The framework consists of two core modules: a local module that uses spatial memory to store previous beliefs with parallel updates; and a global graph-reasoning module.

Visual Reasoning

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

7 code implementations CVPR 2018 Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi

Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments.

Collision Avoidance Motion Forecasting +4

Image Generation from Scene Graphs

4 code implementations CVPR 2018 Justin Johnson, Agrim Gupta, Li Fei-Fei

To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships.

Image Generation from Scene Graphs Layout-to-Image Generation

Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

no code implementations CVPR 2018 De-An Huang, Shyamal Buch, Lucio Dery, Animesh Garg, Li Fei-Fei, Juan Carlos Niebles

In this work, we propose to tackle this new task with a weakly-supervised framework for reference-aware visual grounding in instructional videos, where only the temporal alignment between the transcription and the video segment are available for supervision.

Multiple Instance Learning Sentence +1

Flexible Neural Representation for Physics Prediction

no code implementations NeurIPS 2018 Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Joshua B. Tenenbaum, Daniel L. K. Yamins

Humans have a remarkable capacity to understand the physical dynamics of objects in their environment, flexibly capturing complex structures and interactions at multiple levels of detail.

Relation Network

Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

no code implementations25 Jun 2018 Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, Silvio Savarese

We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering.

Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go?

no code implementations ICML 2018 Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter Glynn, Yinyu Ye, Li-Jia Li, Li Fei-Fei

One of the most widely used optimization methods for large-scale machine learning problems is distributed asynchronous stochastic gradient descent (DASGD).

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

no code implementations CVPR 2019 De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles

We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model.

HiDDeN: Hiding Data With Deep Networks

6 code implementations ECCV 2018 Jiren Zhu, Russell Kaplan, Justin Johnson, Li Fei-Fei

We show that these encodings are competitive with existing data hiding algorithms, and further that they can be made robust to noise: our models learn to reconstruct hidden information in an encoded image despite the presence of Gaussian blurring, pixel-wise dropout, cropping, and JPEG compression.

Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos

no code implementations ECCV 2018 Bingbin Liu, Serena Yeung, Edward Chou, De-An Huang, Li Fei-Fei, Juan Carlos Niebles

A major challenge in computer vision is scaling activity understanding to the long tail of complex activities without requiring collecting large quantities of data for new actions.

Retrieval Video Retrieval

Neural Graph Matching Networks for Fewshot 3D Action Recognition

no code implementations ECCV 2018 Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, Li Fei-Fei

We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples.

Few-Shot Learning Graph Matching +1

RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

no code implementations7 Nov 2018 Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei

Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification.

Imitation Learning

Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions

no code implementations21 Nov 2018 Albert Haque, Michelle Guo, Adam S. Miner, Li Fei-Fei

This technology could be deployed to cell phones worldwide and facilitate low-cost universal access to mental health care.

Specificity speech-recognition +1

Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference

no code implementations25 Nov 2018 Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, Li Fei-Fei

Homomorphic encryption enables arbitrary computation over data while it remains encrypted.

Cryptography and Security

Privacy-Preserving Action Recognition for Smart Hospitals using Low-Resolution Depth Images

no code implementations25 Nov 2018 Edward Chou, Matthew Tan, Cherry Zou, Michelle Guo, Albert Haque, Arnold Milstein, Li Fei-Fei

Computer-vision hospital systems can greatly assist healthcare workers and improve medical facility treatment, but often face patient resistance due to the perceived intrusiveness and violation of privacy associated with visual surveillance.

Action Recognition Privacy Preserving +2

Vision-Based Gait Analysis for Senior Care

no code implementations1 Dec 2018 David Xue, Anin Sayana, Evan Darke, Kelly Shen, Jun-Ting Hsieh, Zelun Luo, Li-Jia Li, N. Lance Downing, Arnold Milstein, Li Fei-Fei

As the senior population rapidly increases, it is challenging yet crucial to provide effective long-term care for seniors who live at home or in senior care facilities.

Composing Text and Image for Image Retrieval - An Empirical Odyssey

4 code implementations CVPR 2019 Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays

In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.

Image Retrieval Image Retrieval with Multi-Modal Query +1

D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

no code implementations CVPR 2019 Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge for discriminative modeling with weak supervision is that the loss function of the ordering supervision is usually formulated using dynamic programming and is thus not differentiable.

Dynamic Time Warping Segmentation +1

Audio-Linguistic Embeddings for Spoken Sentences

1 code implementation20 Feb 2019 Albert Haque, Michelle Guo, Prateek Verma, Li Fei-Fei

We propose spoken sentence embeddings which capture both acoustic and linguistic content.

Emotion Recognition Sentence +4

Information Maximizing Visual Question Generation

no code implementations CVPR 2019 Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We build a model that maximizes mutual information between the image, the expected answer and the generated question.

Clustering Question Generation +1

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

no code implementations NeurIPS 2019 Sharon Zhou, Mitchell L. Gordon, Ranjay Krishna, Austin Narcomey, Li Fei-Fei, Michael S. Bernstein

We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.

Image Generation Unconditional Image Generation

Scene Graph Prediction with Limited Labels

1 code implementation ICCV 2019 Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei

All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each.

Knowledge Base Completion Question Answering +2

Eidetic 3D LSTM: A Model for Video Prediction and Beyond

3 code implementations ICLR 2019 Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei

We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance.

 Ranked #1 on Video Prediction on KTH (Cond metric)

Activity Recognition Video Prediction +1

Procedure Planning in Instructional Videos

no code implementations ECCV 2020 Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking.

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

no code implementations16 Aug 2019 De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures.

Imitation Learning

Situational Fusion of Visual Representation for Visual Navigation

no code implementations ICCV 2019 Bokui Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese

A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities.

Visual Navigation

Stochastic Neural Physics Predictor

no code implementations25 Sep 2019 Piotr Tatarczyk, Damian Mrowca, Li Fei-Fei, Daniel L. K. Yamins, Nils Thuerey

Recently, neural-network based forward dynamics models have been proposed that attempt to learn the dynamics of physical systems in a deterministic way.

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

1 code implementation28 Sep 2019 Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.

Continuous Control

Regression Planning Networks

1 code implementation NeurIPS 2019 Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Recent learning-to-plan methods have shown promising results on planning directly from observation space.

regression

Causal Induction from Visual Observations for Goal Directed Tasks

2 code implementations3 Oct 2019 Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world.

Representation Learning with Statistical Independence to Mitigate Bias

2 code implementations8 Oct 2019 Ehsan Adeli, Qingyu Zhao, Adolf Pfefferbaum, Edith V. Sullivan, Li Fei-Fei, Juan Carlos Niebles, Kilian M. Pohl

Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years.

Face Recognition Gender Classification +1

KETO: Learning Keypoint Representations for Tool Manipulation

no code implementations26 Oct 2019 Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei, Silvio Savarese

For this purpose, we present KETO, a framework of learning keypoint representations of tool-based manipulation.

Robotics

Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation

no code implementations29 Oct 2019 Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal.

Variational Inference

Interactive Gibson Benchmark (iGibson 0.5): A Benchmark for Interactive Navigation in Cluttered Environments

1 code implementation30 Oct 2019 Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese

We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task.

Robot Navigation

Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity

no code implementations11 Nov 2019 Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

We evaluate the quality of our platform, the diversity of demonstrations in our dataset, and the utility of our dataset via quantitative and qualitative analysis.

Robot Manipulation

Motion Reasoning for Goal-Based Imitation Learning

no code implementations13 Nov 2019 De-An Huang, Yu-Wei Chao, Chris Paxton, Xinke Deng, Li Fei-Fei, Juan Carlos Niebles, Animesh Garg, Dieter Fox

We further show that by using the automatically inferred goal from the video demonstration, our robot is able to reproduce the same task in a real kitchen environment.

Imitation Learning Motion Planning +1

IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

no code implementations13 Nov 2019 Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox

For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task.

Robot Manipulation

Deep Bayesian Active Learning for Multiple Correct Outputs

no code implementations2 Dec 2019 Khaled Jedoui, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

The assumption that these tasks always have exactly one correct answer has resulted in the creation of numerous uncertainty-based measurements, such as entropy and least confidence, which operate over a model's outputs.

Active Learning Answer Generation +4

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

1 code implementation15 Dec 2019 Jingwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

Next, by decomposing and learning the temporal changes in visual relationships that result in an action, we demonstrate the utility of a hierarchical event decomposition by enabling few-shot action recognition, achieving 42. 7% mAP using as few as 10 examples.

Few-Shot action recognition Few Shot Action Recognition +1

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations

no code implementations13 Mar 2020 Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, Li Fei-Fei

In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations.

Imitation Learning

Learning Physical Graph Representations from Visual Scenes

1 code implementation NeurIPS 2020 Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins

To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts.

Object Object Categorization +1

Adaptive Procedural Task Generation for Hard-Exploration Problems

no code implementations ICLR 2021 Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.

Vision-based Estimation of MDS-UPDRS Gait Scores for Assessing Parkinson's Disease Motor Severity

no code implementations17 Jul 2020 Mandy Lu, Kathleen Poston, Adolf Pfefferbaum, Edith V. Sullivan, Li Fei-Fei, Kilian M. Pohl, Juan Carlos Niebles, Ehsan Adeli

This is the first benchmark for classifying PD patients based on MDS-UPDRS gait severity and could be an objective biomarker for disease severity.

Conceptual Metaphors Impact Perceptions of Human-AI Collaboration

no code implementations5 Aug 2020 Pranav Khadpe, Ranjay Krishna, Li Fei-Fei, Jeffrey Hancock, Michael Bernstein

In a third study, we assess effects of metaphor choices on potential users' desire to try out the system and find that users are drawn to systems that project higher competence and warmth.

Learning Multi-Arm Manipulation Through Collaborative Teleoperation

no code implementations12 Dec 2020 Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks.

Imitation Learning

Human-in-the-Loop Imitation Learning using Remote Teleoperation

no code implementations12 Dec 2020 Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

We develop a simple and effective algorithm to train the policy iteratively on new data collected by the system that encourages the policy to learn how to traverse bottlenecks through the interventions.

Imitation Learning Robot Manipulation

Embodied Intelligence via Learning and Evolution

1 code implementation3 Feb 2021 Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei

However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning.

Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control

no code implementations28 Feb 2021 Chen Wang, Rui Wang, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Danfei Xu

Key to such capability is hand-eye coordination, a cognitive ability that enables humans to adaptively direct their movements at task-relevant objects and be invariant to the objects' absolute spatial location.

Imitation Learning Zero-shot Generalization

Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction

no code implementations CVPR 2021 Bohan Wu, Suraj Nair, Roberto Martin-Martin, Li Fei-Fei, Chelsea Finn

Our key insight is that greedy and modular optimization of hierarchical autoencoders can simultaneously address both the memory constraints and the optimization challenges of large-scale video prediction.

Video Prediction

A Study of Face Obfuscation in ImageNet

1 code implementation10 Mar 2021 Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, Olga Russakovsky

In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark.

Attribute Object +5

Metadata Normalization

1 code implementation CVPR 2021 Mandy Lu, Qingyu Zhao, Jiequan Zhang, Kilian M. Pohl, Li Fei-Fei, Juan Carlos Niebles, Ehsan Adeli

Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods.

Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning

1 code implementation CVPR 2022 Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei, Daniel Rubin

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution.

Federated Learning

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

1 code implementation17 Jun 2021 Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert.

Autonomous Driving Image Augmentation +3

Scalable Differential Privacy With Sparse Network Finetuning

no code implementations CVPR 2021 Zelun Luo, Daniel J. Wu, Ehsan Adeli, Li Fei-Fei

We propose a novel method for privacy-preserving training of deep neural networks leveraging public, out-domain data.

Privacy Preserving Transfer Learning

Discovering Generalizable Skills via Automated Generation of Diverse Tasks

no code implementations26 Jun 2021 Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks.

Hierarchical Reinforcement Learning reinforcement-learning +1

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

1 code implementation ACL 2021 Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition.

Active Learning Object Recognition +3

Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning

no code implementations20 Jul 2021 Kaylee Burns, Christopher D. Manning, Li Fei-Fei

Although virtual agents are increasingly situated in environments where natural language is the most effective mode of interaction with humans, these exchanges are rarely used as an opportunity for learning.

Grounded language learning

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

1 code implementation6 Aug 2021 Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martín-Martín

Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation.

Imitation Learning reinforcement-learning +2

iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

1 code implementation6 Aug 2021 Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

We evaluate the new capabilities of iGibson 2. 0 to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI.

Imitation Learning

BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

no code implementations6 Aug 2021 Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei

We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation.

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

1 code implementation13 Aug 2021 Chen Wang, Claudia Pérez-D'Arpino, Danfei Xu, Li Fei-Fei, C. Karen Liu, Silvio Savarese

Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator.

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks

no code implementations21 Sep 2021 Bohan Wu, Suraj Nair, Li Fei-Fei, Chelsea Finn

In this paper, we study the problem of learning a repertoire of low-level skills from raw images that can be sequenced to complete long-horizon visuomotor tasks.

Model-based Reinforcement Learning reinforcement-learning +1

Visual Intelligence through Human Interaction

no code implementations12 Nov 2021 Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein

Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces and even generating novel visual content.

Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation

no code implementations9 Dec 2021 Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Roberto Martín-Martín

Doing this is challenging for two reasons: on the data side, current interfaces make collecting high-quality human demonstrations difficult, and on the learning side, policies trained on limited data can suffer from covariate shift when deployed.

Imitation Learning Navigate

MetaMorph: Learning Universal Controllers with Transformers

2 code implementations ICLR 2022 Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei

Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning.

Zero-shot Generalization

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

1 code implementation CVPR 2022 Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

We present ObjectFolder 2. 0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1. 0 in three aspects.

Object

PrivHAR: Recognizing Human Actions From Privacy-preserving Lens

no code implementations8 Jun 2022 Carlos Hinojosa, Miguel Marquez, Henry Arguello, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

The accelerated use of digital cameras prompts an increasing concern about privacy and security, particularly in applications such as action recognition.

Action Recognition Privacy Preserving +1

BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents

no code implementations13 Jun 2022 Ziang Liu, Roberto Martín-Martín, Fei Xia, Jiajun Wu, Li Fei-Fei

Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks.

Benchmarking

MaskViT: Masked Visual Pre-Training for Video Prediction

no code implementations23 Jun 2022 Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei

This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling.

Scheduling Video Prediction

GaitForeMer: Self-Supervised Pre-Training of Transformers via Human Motion Forecasting for Few-Shot Gait Impairment Severity Estimation

1 code implementation30 Jun 2022 Mark Endo, Kathleen L. Poston, Edith V. Sullivan, Li Fei-Fei, Kilian M. Pohl, Ehsan Adeli

Because of this clinical data scarcity and inspired by the recent advances in self-supervised large-scale language models like GPT-3, we use human motion forecasting as an effective self-supervised pre-training task for the estimation of motor impairment severity.

Motion Forecasting severity prediction

VIMA: General Robot Manipulation with Multimodal Prompts

2 code implementations6 Oct 2022 Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan

We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens.

Imitation Learning Language Modelling +3

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

1 code implementation9 Oct 2022 Zixian Ma, Rose Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna

These results identify tasks where expectation alignment is a more useful strategy than curiosity-driven exploration for multi-agent coordination, enabling agents to do zero-shot coordination.

Multi-agent Reinforcement Learning

Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks

no code implementations11 Nov 2022 Kuan Fang, Toki Migimatsu, Ajay Mandlekar, Li Fei-Fei, Jeannette Bohg

ATR selects suitable tasks, which consist of an initial environment state and manipulation goal, for learning robust skills by balancing the diversity and feasibility of the tasks.

Modeling Dynamic Environments with Scene Graph Memory

no code implementations27 May 2023 Andrey Kurenkov, Michael Lingelbach, Tanmay Agarwal, Emily Jin, Chengshu Li, Ruohan Zhang, Li Fei-Fei, Jiajun Wu, Silvio Savarese, Roberto Martín-Martín

We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy.

Link Prediction

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

no code implementations CVPR 2023 Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu

We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch.

Benchmarking Object +1

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

1 code implementation1 Jun 2023 Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.

Multi-Task Learning Visual Navigation

HomE: Homography-Equivariant Video Representation Learning

1 code implementation2 Jun 2023 Anirudh Sriram, Adrien Gaidon, Jiajun Wu, Juan Carlos Niebles, Li Fei-Fei, Ehsan Adeli

In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE).

Action Classification Action Recognition +2

Differentially Private Video Activity Recognition

no code implementations27 Jun 2023 Zelun Luo, Yuliang Zou, Yijin Yang, Zane Durante, De-An Huang, Zhiding Yu, Chaowei Xiao, Li Fei-Fei, Animashree Anandkumar

In recent years, differential privacy has seen significant advancements in image classification; however, its application to video activity recognition remains under-explored.

Activity Recognition Classification +2

Dynamic-Resolution Model Learning for Object Pile Manipulation

no code implementations29 Jun 2023 YiXuan Wang, Yunzhu Li, Katherine Driggs-Campbell, Li Fei-Fei, Jiajun Wu

Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks.

Model Predictive Control Object

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

1 code implementation12 Jul 2023 Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei

The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations.

Language Modelling Robot Manipulation

Primitive Skill-based Robot Learning from Human Evaluative Feedback

no code implementations28 Jul 2023 Ayano Hiranaka, Minjune Hwang, Sharon Lee, Chen Wang, Li Fei-Fei, Jiajun Wu, Ruohan Zhang

By combining them, SEED reduces the human effort required in RLHF and increases safety in training robot manipulation with RL in real-world settings.

reinforcement-learning Reinforcement Learning (RL) +1

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

no code implementations2 Sep 2023 Yuanpei Chen, Chen Wang, Li Fei-Fei, C. Karen Liu

However, the challenges arise due to the high-dimensional action space of dexterous hand and complex compositional dynamics of the long-horizon tasks.

Reinforcement Learning (RL)

MindAgent: Emergent Gaming Interaction

no code implementations18 Sep 2023 Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao

Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration.

In-Context Learning Scheduling

D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

no code implementations28 Sep 2023 YiXuan Wang, Zhuoran Li, Mingtong Zhang, Katherine Driggs-Campbell, Jiajun Wu, Li Fei-Fei, Yunzhu Li

These fields capture the dynamics of the underlying 3D environment and encode both semantic features and instance masks.

Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI

1 code implementation3 Oct 2023 Emily Jin, Jiaheng Hu, Zhuoyi Huang, Ruohan Zhang, Jiajun Wu, Li Fei-Fei, Roberto Martín-Martín

We present Mini-BEHAVIOR, a novel benchmark for embodied AI that challenges agents to use reasoning and decision-making skills to solve complex activities that resemble everyday human challenges.

Decision Making

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

no code implementations27 Oct 2023 Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.

Novel View Synthesis

NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

no code implementations2 Nov 2023 Ruohan Zhang, Sharon Lee, Minjune Hwang, Ayano Hiranaka, Chen Wang, Wensi Ai, Jin Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, Ruohan Gao, Anthony Norcia, Li Fei-Fei, Jiajun Wu

We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals.

EEG

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

no code implementations7 Dec 2023 Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, Brian Ichter

For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable).

Language Modelling

Model-Based Control with Sparse Neural Dynamics

no code implementations NeurIPS 2023 Ziang Liu, Genggeng Zhou, Jeff He, Tobia Marcucci, Li Fei-Fei, Jiajun Wu, Yunzhu Li

In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms.

Wild2Avatar: Rendering Humans Behind Occlusions

no code implementations31 Dec 2023 Tiange Xiang, Adam Sun, Scott Delp, Kazuki Kozuka, Li Fei-Fei, Ehsan Adeli

In this work, we present Wild2Avatar, a neural rendering approach catered for occluded in-the-wild monocular videos.

Neural Rendering

Agent AI: Surveying the Horizons of Multimodal Interaction

1 code implementation7 Jan 2024 Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao

To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions.

DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation

no code implementations12 Mar 2024 Chen Wang, Haochen Shi, Weizhuo Wang, Ruohan Zhang, Li Fei-Fei, C. Karen Liu

Imitation learning from human hand motion data presents a promising avenue for imbuing robots with human-like dexterity in real-world manipulation tasks.

Imitation Learning

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition

1 code implementation ECCV 2020 Linxi Fan, Shyamal Buch, Guanzhi Wang, Ryan Cao, Yuke Zhu, Juan Carlos Niebles, Li Fei-Fei

We analyze the suitability of our new primitive for video action recognition and explore several novel variations of our approach to enable stronger representational flexibility while maintaining an efficient design.

Action Recognition Temporal Action Localization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.