Search Results for author: De-An Huang

Found 46 papers, 15 papers with code

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

no code implementations14 Jan 2025 Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma

To achieve consistent region representation across spatio-temporal dimensions, we introduce Token Mark, a set of tokens highlighting the target regions within the visual feature space.

Language Modeling Language Modelling +5

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

2 code implementations28 Aug 2024 Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu

We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies.

Optical Character Recognition

ARDuP: Active Region Video Diffusion for Universal Policies

no code implementations19 Jun 2024 Shuaiyi Huang, Mara Levy, Zhenyu Jiang, Anima Anandkumar, Yuke Zhu, Linxi Fan, De-An Huang, Abhinav Shrivastava

Sequential decision-making can be formulated as a text-conditioned video generation problem, where a video planner, guided by a text-defined goal, generates future frames visualizing planned actions, from which control actions are subsequently derived.

Decision Making Sequential Decision Making +1

X-VILA: Cross-Modality Alignment for Large Language Model

no code implementations29 May 2024 Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.

Instruction Following Language Modeling +2

LITA: Language Instructed Temporal-Localization Assistant

1 code implementation27 Mar 2024 De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz

In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task.

Instruction Following Temporal Localization +2

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

no code implementations21 Mar 2024 Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation.

Video Generation

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

1 code implementation21 Feb 2024 Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

Deep Multimodal Fusion for Surgical Feedback Classification

no code implementations6 Dec 2023 Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung

This work offers an important first look at the feasibility of automated classification of real-world live surgical feedback based on text, audio, and video modalities.


Eureka: Human-Level Reward Design via Coding Large Language Models

1 code implementation19 Oct 2023 Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar

The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating.

Decision Making In-Context Learning +2

Differentially Private Video Activity Recognition

no code implementations27 Jun 2023 Zelun Luo, Yuliang Zou, Yijin Yang, Zane Durante, De-An Huang, Zhiding Yu, Chaowei Xiao, Li Fei-Fei, Animashree Anandkumar

In recent years, differential privacy has seen significant advancements in image classification; however, its application to video activity recognition remains under-explored.

Activity Recognition Classification +2

PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

1 code implementation CVPR 2024 Chulin Xie, De-An Huang, Wenda Chu, Daguang Xu, Chaowei Xiao, Bo Li, Anima Anandkumar

In this paper, we propose PerAda, a parameter-efficient pFL framework that reduces communication and computational costs and exhibits superior generalization performance, especially under test-time distribution shifts.

Generalization Bounds Knowledge Distillation +2

I$^2$SB: Image-to-Image Schrödinger Bridge

1 code implementation12 Feb 2023 Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, Anima Anandkumar

We propose Image-to-Image Schr\"odinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions.

Deblurring Image Restoration +1

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

1 code implementation15 Sep 2022 Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao

In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.

Image Classification Zero-shot Generalization

Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding

no code implementations22 Aug 2022 Stephen Su, Samuel Kwong, Qingyu Zhao, De-An Huang, Juan Carlos Niebles, Ehsan Adeli

In this work, we propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on.

Action Recognition Multi-Task Learning +3

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

2 code implementations3 Aug 2022 De-An Huang, Zhiding Yu, Anima Anandkumar

By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.

Instance Segmentation Segmentation +2

Pre-Trained Language Models for Interactive Decision-Making

1 code implementation3 Feb 2022 Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

Imitation Learning Language Modeling +2

Scaling Fair Learning to Hundreds of Intersectional Groups

no code implementations29 Sep 2021 Eric Zhao, De-An Huang, Hao liu, Zhiding Yu, Anqi Liu, Olga Russakovsky, Anima Anandkumar

In real-world applications, however, there are multiple protected attributes yielding a large number of intersectional protected groups.

Attribute Fairness +1

Auditing AI models for Verified Deployment under Semantic Specifications

no code implementations25 Sep 2021 Homanga Bharadhwaj, De-An Huang, Chaowei Xiao, Anima Anandkumar, Animesh Garg

We enable such unit tests through variations in a semantically-interpretable latent space of a generative model.

Face Recognition

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

1 code implementation17 Jun 2021 Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert.

Autonomous Driving Image Augmentation +3

Transferable Unsupervised Robust Representation Learning

no code implementations1 Jan 2021 De-An Huang, Zhiding Yu, Anima Anandkumar

We upend this view and show that URRL improves both the natural accuracy of unsupervised representation learning and its robustness to corruptions and adversarial noise.

Data Augmentation Representation Learning +1

Motion Reasoning for Goal-Based Imitation Learning

no code implementations13 Nov 2019 De-An Huang, Yu-Wei Chao, Chris Paxton, Xinke Deng, Li Fei-Fei, Juan Carlos Niebles, Animesh Garg, Dieter Fox

We further show that by using the automatically inferred goal from the video demonstration, our robot is able to reproduce the same task in a real kitchen environment.

Imitation Learning Motion Planning +1

Regression Planning Networks

1 code implementation NeurIPS 2019 Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Recent learning-to-plan methods have shown promising results on planning directly from observation space.


Imitation Learning for Human Pose Prediction

no code implementations ICCV 2019 Borui Wang, Ehsan Adeli, Hsu-kuang Chiu, De-An Huang, Juan Carlos Niebles

Modeling and prediction of human motion dynamics has long been a challenging problem in computer vision, and most existing methods rely on the end-to-end supervised training of various architectures of recurrent neural networks.

Ranked #2 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Deep Reinforcement Learning Human Pose Forecasting +4

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

no code implementations16 Aug 2019 De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures.

Imitation Learning

Procedure Planning in Instructional Videos

no code implementations ECCV 2020 Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking.

D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

no code implementations CVPR 2019 Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge for discriminative modeling with weak supervision is that the loss function of the ordering supervision is usually formulated using dynamic programming and is thus not differentiable.

Dynamic Time Warping Segmentation +1

Action-Agnostic Human Pose Forecasting

1 code implementation23 Oct 2018 Hsu-kuang Chiu, Ehsan Adeli, Borui Wang, De-An Huang, Juan Carlos Niebles

In this paper, we propose a new action-agnostic method for short- and long-term human pose forecasting.

Ranked #5 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Human Dynamics Human Pose Forecasting

Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos

no code implementations ECCV 2018 Bingbin Liu, Serena Yeung, Edward Chou, De-An Huang, Li Fei-Fei, Juan Carlos Niebles

A major challenge in computer vision is scaling activity understanding to the long tail of complex activities without requiring collecting large quantities of data for new actions.

Retrieval Video Retrieval

Neural Graph Matching Networks for Fewshot 3D Action Recognition

no code implementations ECCV 2018 Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, Li Fei-Fei

We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples.

Few-Shot Learning Graph Matching +1

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

no code implementations CVPR 2019 De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles

We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model.

Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

no code implementations CVPR 2018 De-An Huang, Shyamal Buch, Lucio Dery, Animesh Garg, Li Fei-Fei, Juan Carlos Niebles

In this work, we propose to tackle this new task with a weakly-supervised framework for reference-aware visual grounding in instructional videos, where only the temporal alignment between the transcription and the video segment are available for supervision.

Multiple Instance Learning Sentence +1

Visual Forecasting by Imitating Dynamics in Natural Sequences

no code implementations ICCV 2017 Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles

This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.

Action Anticipation Reinforcement Learning

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

no code implementations CVPR 2017 De-An Huang, Joseph J. Lim, Li Fei-Fei, Juan Carlos Niebles

We propose an unsupervised method for reference resolution in instructional videos, where the goal is to temporally link an entity (e. g., "dressing") to the action (e. g., "mix yogurt") that produced it.

Referring Expression

Unsupervised Learning of Long-Term Motion Dynamics for Videos

no code implementations CVPR 2017 Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos.

Decoder Representation Learning

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

no code implementations28 Jul 2016 De-An Huang, Li Fei-Fei, Juan Carlos Niebles

We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time.

General Classification

Forecasting Interactive Dynamics of Pedestrians with Fictitious Play

no code implementations CVPR 2017 Wei-Chiu Ma, De-An Huang, Namhoon Lee, Kris M. Kitani

We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory, and deep learning-based visual analysis to estimate person-specific behavior parameters.

Decision Making

Robust Classification by Pre-conditioned LASSO and Transductive Diffusion Component Analysis

no code implementations19 Nov 2015 Yanwei Fu, De-An Huang, Leonid Sigal

Collecting datasets in this way, however, requires robust and efficient ways for detecting and excluding outliers that are common and prevalent.

BIG-bench Machine Learning Classification +3

How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps

no code implementations CVPR 2015 De-An Huang, Minghuang Ma, Wei-Chiu Ma, Kris M. Kitani

Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies.

Clustering Online Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.