Search Results for author: De-An Huang

Found 41 papers, 12 papers with code

What is Point Supervision Worth in Video Instance Segmentation?

no code implementations • 1 Apr 2024 • Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar

Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos.

Instance Segmentation Object +2

Paper
Add Code

LITA: Language Instructed Temporal-Localization Assistant

1 code implementation • 27 Mar 2024 • De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz

In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task.

Ranked #4 on Video-based Generative Performance Benchmarking on VideoInstruct

Instruction Following Temporal Localization +2

Paper
Code

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

no code implementations • 21 Mar 2024 • Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation.

Video Generation

Paper
Add Code

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

1 code implementation • 21 Feb 2024 • Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

Paper
Code

Deep Multimodal Fusion for Surgical Feedback Classification

no code implementations • 6 Dec 2023 • Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung

This work offers an important first look at the feasibility of automated classification of real-world live surgical feedback based on text, audio, and video modalities.

Classification

Paper
Add Code

Eureka: Human-Level Reward Design via Coding Large Language Models

1 code implementation • 19 Oct 2023 • Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar

The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating.

Decision Making In-Context Learning +1

2,580

Paper
Code

Differentially Private Video Activity Recognition

no code implementations • 27 Jun 2023 • Zelun Luo, Yuliang Zou, Yijin Yang, Zane Durante, De-An Huang, Zhiding Yu, Chaowei Xiao, Li Fei-Fei, Animashree Anandkumar

In recent years, differential privacy has seen significant advancements in image classification; however, its application to video activity recognition remains under-explored.

Activity Recognition Classification +2

Paper
Add Code

PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

no code implementations • 13 Feb 2023 • Chulin Xie, De-An Huang, Wenda Chu, Daguang Xu, Chaowei Xiao, Bo Li, Anima Anandkumar

In this paper, we propose PerAda, a parameter-efficient pFL framework that reduces communication and computational costs and exhibits superior generalization performance, especially under test-time distribution shifts.

Generalization Bounds Knowledge Distillation +2

Paper
Add Code

I$^2$SB: Image-to-Image Schrödinger Bridge

1 code implementation • 12 Feb 2023 • Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, Anima Anandkumar

We propose Image-to-Image Schr\"odinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions.

Deblurring Image Restoration +1

205

Paper
Code

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained the state-of-the-art results in image-to-text generation.

Few-Shot Learning Image Captioning +3

Paper
Add Code

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

2 code implementations • 15 Sep 2022 • Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao

In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.

Image Classification Zero-shot Generalization

117

Paper
Code

Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding

no code implementations • 22 Aug 2022 • Stephen Su, Samuel Kwong, Qingyu Zhao, De-An Huang, Juan Carlos Niebles, Ehsan Adeli

In this work, we propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on.

Action Recognition Multi-Task Learning +3

Paper
Add Code

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

2 code implementations • 3 Aug 2022 • De-An Huang, Zhiding Yu, Anima Anandkumar

By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.

Ranked #13 on Video Instance Segmentation on YouTube-VIS validation

Instance Segmentation Segmentation +2

261

Paper
Code

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

1 code implementation • 17 Jun 2022 • Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar

Autonomous agents have made great strides in specialist domains like Atari games and Go.

Atari Games

1,655

Paper
Code

Pre-Trained Language Models for Interactive Decision-Making

1 code implementation • 3 Feb 2022 • Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

Imitation Learning Language Modelling

383

Paper
Code

Scaling Fair Learning to Hundreds of Intersectional Groups

no code implementations • 29 Sep 2021 • Eric Zhao, De-An Huang, Hao liu, Zhiding Yu, Anqi Liu, Olga Russakovsky, Anima Anandkumar

In real-world applications, however, there are multiple protected attributes yielding a large number of intersectional protected groups.

Attribute Fairness +1

Paper
Add Code

Auditing AI models for Verified Deployment under Semantic Specifications

no code implementations • 25 Sep 2021 • Homanga Bharadhwaj, De-An Huang, Chaowei Xiao, Anima Anandkumar, Animesh Garg

We enable such unit tests through variations in a semantically-interpretable latent space of a generative model.

Face Recognition

Paper
Add Code

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

1 code implementation • 17 Jun 2021 • Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert.

Autonomous Driving Image Augmentation +3

Paper
Code

Transferable Unsupervised Robust Representation Learning

no code implementations • 1 Jan 2021 • De-An Huang, Zhiding Yu, Anima Anandkumar

We upend this view and show that URRL improves both the natural accuracy of unsupervised representation learning and its robustness to corruptions and adversarial noise.

Data Augmentation Representation Learning +1

Paper
Add Code

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

no code implementations • CVPR 2020 • Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles

In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time.

Knowledge Distillation Object +2

Paper
Add Code

Motion Reasoning for Goal-Based Imitation Learning

no code implementations • 13 Nov 2019 • De-An Huang, Yu-Wei Chao, Chris Paxton, Xinke Deng, Li Fei-Fei, Juan Carlos Niebles, Animesh Garg, Dieter Fox

We further show that by using the automatically inferred goal from the video demonstration, our robot is able to reproduce the same task in a real kitchen environment.

Imitation Learning Motion Planning +1

Paper
Add Code

Regression Planning Networks

1 code implementation • NeurIPS 2019 • Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Recent learning-to-plan methods have shown promising results on planning directly from observation space.

regression

Paper
Code

Imitation Learning for Human Pose Prediction

no code implementations • ICCV 2019 • Borui Wang, Ehsan Adeli, Hsu-kuang Chiu, De-An Huang, Juan Carlos Niebles

Modeling and prediction of human motion dynamics has long been a challenging problem in computer vision, and most existing methods rely on the end-to-end supervised training of various architectures of recurrent neural networks.

Ranked #2 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Human Pose Forecasting Imitation Learning +3

Paper
Add Code

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

no code implementations • 16 Aug 2019 • De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures.

Imitation Learning

Paper
Add Code

Procedure Planning in Instructional Videos

no code implementations • ECCV 2020 • Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking.

Paper
Add Code

D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

no code implementations • CVPR 2019 • Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge for discriminative modeling with weak supervision is that the loss function of the ordering supervision is usually formulated using dynamic programming and is thus not differentiable.

Ranked #5 on Weakly Supervised Action Segmentation (Transcript) on Breakfast

Dynamic Time Warping Segmentation +1

Paper
Add Code

Action-Agnostic Human Pose Forecasting

1 code implementation • 23 Oct 2018 • Hsu-kuang Chiu, Ehsan Adeli, Borui Wang, De-An Huang, Juan Carlos Niebles

In this paper, we propose a new action-agnostic method for short- and long-term human pose forecasting.

Ranked #5 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Human Dynamics Human Pose Forecasting

Paper
Code

Neural Graph Matching Networks for Fewshot 3D Action Recognition

no code implementations • ECCV 2018 • Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, Li Fei-Fei

We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples.

Ranked #1 on Skeleton Based Action Recognition on CAD-120

Few-Shot Learning Graph Matching +1

Paper
Add Code

Dynamic Task Prioritization for Multitask Learning

no code implementations • ECCV 2018 • Michelle Guo, Albert Haque, De-An Huang, Serena Yeung, Li Fei-Fei

We propose dynamic task prioritization for multitask learning.

Paper
Add Code

Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos

no code implementations • ECCV 2018 • Bingbin Liu, Serena Yeung, Edward Chou, De-An Huang, Li Fei-Fei, Juan Carlos Niebles

A major challenge in computer vision is scaling activity understanding to the long tail of complex activities without requiring collecting large quantities of data for new actions.

Retrieval Video Retrieval

Paper
Add Code

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

no code implementations • CVPR 2019 • De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles

We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model.

Paper
Add Code

Learning to Decompose and Disentangle Representations for Video Prediction

1 code implementation • NeurIPS 2018 • Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li Fei-Fei, Juan Carlos Niebles

Our goal is to predict future video frames given a sequence of input frames.

Disentanglement Predict Future Video Frames

134

Paper
Code

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

no code implementations • CVPR 2018 • De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Lorenzo Torresani, Manohar Paluri, Li Fei-Fei, Juan Carlos Niebles

The ability to capture temporal information has been critical to the development of video understanding models.

Video Understanding

Paper
Add Code

Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

no code implementations • CVPR 2018 • De-An Huang, Shyamal Buch, Lucio Dery, Animesh Garg, Li Fei-Fei, Juan Carlos Niebles

In this work, we propose to tackle this new task with a weakly-supervised framework for reference-aware visual grounding in instructional videos, where only the temporal alignment between the transcription and the video segment are available for supervision.

Multiple Instance Learning Sentence +1

Paper
Add Code

Visual Forecasting by Imitating Dynamics in Natural Sequences

no code implementations • ICCV 2017 • Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles

This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.

Action Anticipation

Paper
Add Code

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

no code implementations • CVPR 2017 • De-An Huang, Joseph J. Lim, Li Fei-Fei, Juan Carlos Niebles

We propose an unsupervised method for reference resolution in instructional videos, where the goal is to temporally link an entity (e. g., "dressing") to the action (e. g., "mix yogurt") that produced it.

Referring Expression

Paper
Add Code

Unsupervised Learning of Long-Term Motion Dynamics for Videos

no code implementations • CVPR 2017 • Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos.

Representation Learning

Paper
Add Code

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

no code implementations • 28 Jul 2016 • De-An Huang, Li Fei-Fei, Juan Carlos Niebles

We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time.

General Classification

Paper
Add Code

Forecasting Interactive Dynamics of Pedestrians with Fictitious Play

no code implementations • CVPR 2017 • Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani

We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters.

Decision Making

Paper
Add Code

Robust Classification by Pre-conditioned LASSO and Transductive Diffusion Component Analysis

no code implementations • 19 Nov 2015 • Yanwei Fu, De-An Huang, Leonid Sigal

Collecting datasets in this way, however, requires robust and efficient ways for detecting and excluding outliers that are common and prevalent.

BIG-bench Machine Learning Classification +3

Paper
Add Code

How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps

no code implementations • CVPR 2015 • De-An Huang, Minghuang Ma, Wei-Chiu Ma, Kris M. Kitani

Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies.

Clustering Online Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.