Search Results for author: Chuang Gan

Found 170 papers, 72 papers with code

DataMix: Efficient Privacy-Preserving Edge-Cloud Inference

no code implementations • ECCV 2020 • Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu, Song Han

Third, our solution is extit{efficient} on the edge since the majority of the workload is delegated to the cloud, and our mixing and de-mixing processes introduce very few extra computations.

Privacy Preserving speech-recognition +1

Paper
Add Code

Virtual Foundry Graphnet for Metal Sintering Deformation Prediction

no code implementations • 17 Apr 2024 • Rachel, Chen, Juheon Lee, Chuang Gan, Zijiang Yang, Mohammad Amin Nabian, Jun Zeng

Metal Sintering is a necessary step for Metal Injection Molded parts and binder jet such as HP's metal 3D printer.

Paper
Add Code

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

no code implementations • 16 Apr 2024 • Hongxin Zhang, Zeyuan Wang, Qiushi Lyu, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan

In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only partial egocentric views of the world.

Paper
Add Code

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

1 code implementation • 14 Mar 2024 • Zhiqing Sun, Longhui Yu, Yikang Shen, Weiyang Liu, Yiming Yang, Sean Welleck, Chuang Gan

This paper answers this question in the context of tackling hard reasoning tasks (e. g., level 4-5 MATH problems) via learning from human annotations on easier tasks (e. g., level 1-3 MATH problems), which we term as \textit{easy-to-hard generalization}.

Math Reinforcement Learning (RL) +1

Paper
Code

3D-VLA: A 3D Vision-Language-Action Generative World Model

no code implementations • 14 Mar 2024 • Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.

Language Modelling Large Language Model +1

Paper
Add Code

ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

no code implementations • 9 Feb 2024 • Zhicheng Zheng, Xin Yan, Zhenfang Chen, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, Chuang Gan

We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy, which shows that the current AI models still lack physical commonsense for the continuum, especially soft-bodies, and illustrates the value of the proposed dataset.

Paper
Add Code

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

no code implementations • 30 Jan 2024 • Shun Zhang, Zhenfang Chen, Sunli Chen, Yikang Shen, Zhiqing Sun, Chuang Gan

Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values.

Language Modelling Large Language Model +1

Paper
Add Code

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

1 code implementation • 23 Jan 2024 • Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world.

Common Sense Reasoning Decision Making +1

Paper
Code

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

no code implementations • 16 Jan 2024 • Yining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li, Chuang Gan

Human beings possess the capability to multiply a melange of multisensory cues while actively exploring and interacting with the 3D world.

Language Modelling Large Language Model

Paper
Add Code

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

1 code implementation • 17 Dec 2023 • Phuc D. A. Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis, Chuang Gan, Anh Tran, Cuong Pham, Khoi Nguyen

We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes.

Ranked #1 on 3D Open-Vocabulary Instance Segmentation on S3DIS

3D Instance Segmentation 3D Open-Vocabulary Instance Segmentation +4

Paper
Code

DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics

no code implementations • NeurIPS 2023 • Zhiao Huang, Feng Chen, Yewen Pu, Chunru Lin, Hao Su, Chuang Gan

Combining gradient-based trajectory optimization with differentiable physics simulation is an efficient technique for solving soft-body manipulation problems.

valid

Paper
Add Code

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning

no code implementations • 10 Dec 2023 • Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou, Mingkui Tan, Chuang Gan

In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Add Code

GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

no code implementations • 8 Nov 2023 • Zhenfang Chen, Rui Sun, Wenjun Liu, Yining Hong, Chuang Gan

If not, we initialize a new module needed by the task and specify the inputs and outputs of this new module.

Question Answering Referring Expression +3

Paper
Add Code

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

no code implementations • 6 Nov 2023 • Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan

A communication token is generated by the LLM following a visual entity or a relation, to inform the detection network to propose regions that are relevant to the sentence generated so far.

CoLA Question Answering +5

Paper
Add Code

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

no code implementations • 2 Nov 2023 • YuFei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Zackory Erickson, David Held, Chuang Gan

We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.

Motion Planning

Paper
Add Code

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

no code implementations • 26 Oct 2023 • Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han

On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e. g., locally fine-tuning large language models on personalized data).

Privacy Preserving

Paper
Add Code

Autonomous Tree-search Ability of Large Language Models

no code implementations • 14 Oct 2023 • Zheyu Zhang, Zhuorui Ye, Yikang Shen, Chuang Gan

This approach yield a greater improvement compared to the ones fine-tuned on CoT data.

Decision Making

Paper
Add Code

Sparse Universal Transformer

no code implementations • 11 Oct 2023 • Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan

The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers.

Paper
Add Code

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions

no code implementations • ICCV 2023 • Chengyang Zhao, Yikang Shen, Zhenfang Chen, Mingyu Ding, Chuang Gan

To tackle this problem, we propose a new framework TextPSG consisting of four modules, i. e., a region grouper, an entity grounder, a segment merger, and a label generator, with several novel techniques.

Graph Generation Panoptic Scene Graph Generation +1

Paper
Add Code

SALMON: Self-Alignment with Instructable Reward Models

1 code implementation • 9 Oct 2023 • Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents.

In-Context Learning Language Modelling

121

Paper
Code

Generalizable Long-Horizon Manipulations with Large Language Models

no code implementations • 3 Oct 2023 • Haoyu Zhou, Mingyu Ding, Weikun Peng, Masayoshi Tomizuka, Lin Shao, Chuang Gan

This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks.

Paper
Add Code

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

no code implementations • 28 Sep 2023 • Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull

We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts.

Paper
Add Code

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Paper
Add Code

$A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

no code implementations • 15 Aug 2023 • Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan

We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet challenging problem in which an agent learns to navigate following a path described by language instructions without requiring any path-instruction annotation data.

Navigate Robot Navigation +1

Paper
Add Code

3D-LLM: Injecting the 3D World into Large Language Models

5 code implementations • NeurIPS 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Ranked #4 on 3D Question Answering (3D-QA) on ScanQA Test w/ objects

3D Question Answering (3D-QA) Dense Captioning +1

750

Paper
Code

Learning Vision-and-Language Navigation from YouTube Videos

1 code implementation • ICCV 2023 • Kunyang Lin, Peihao Chen, Diwei Huang, Thomas H. Li, Mingkui Tan, Chuang Gan

In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it.

Navigate Vision and Language Navigation

Paper
Code

Reparameterized Policy Learning for Multimodal Trajectory Optimization

no code implementations • 20 Jul 2023 • Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su

We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency.

Reinforcement Learning (RL)

Paper
Add Code

Building Cooperative Embodied Agents Modularly with Large Language Models

1 code implementation • 5 Jul 2023 • Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.

Text Generation

164

Paper
Code

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

no code implementations • 29 Jun 2023 • Zitian Chen, Mingyu Ding, Yikang Shen, Wei Zhan, Masayoshi Tomizuka, Erik Learned-Miller, Chuang Gan

We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.

Continual Learning Multi-Task Learning

Paper
Add Code

ModuleFormer: Modularity Emerges from Mixture-of-Experts

1 code implementation • 7 Jun 2023 • Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan

In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.

Language Modelling

215

Paper
Code

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

5 code implementations • 1 Jun 2023 • Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, Chuang Gan, Song Han

Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).

Common Sense Reasoning Language Modelling +1

17,911

Paper
Code

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models

no code implementations • 31 May 2023 • Wei Xiao, Tsun-Hsuan Wang, Chuang Gan, Daniela Rus

Diffusion model-based approaches have shown promise in data-driven planning, but there are no safety guarantees, thus making it hard to be applied for safety-critical applications.

Denoising

Paper
Add Code

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

1 code implementation • NeurIPS 2023 • Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable.

In-Context Learning Language Modelling

1,086

Paper
Code

Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics

no code implementations • 27 Apr 2023 • Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B. Tenenbaum, Tao Du, Chuang Gan, Wojciech Matusik

Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models).

Out-of-Distribution Generalization

Paper
Add Code

EC^2: Emergent Communication for Embodied Control

no code implementations • 19 Apr 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan

We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.

Contrastive Learning Language Modelling

Paper
Add Code

Learning Situation Hyper-Graphs for Video Question Answering

1 code implementation • CVPR 2023 • Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah

The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.

Ranked #6 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)

Question Answering Video Question Answering +1

Paper
Code

Hyper-Decision Transformer for Efficient Online Policy Adaptation

no code implementations • 17 Apr 2023 • Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan

To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner.

Paper
Add Code

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

no code implementations • 7 Apr 2023 • Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging the learned concepts; and (iv) a program executor with deterministic policies to execute each program.

Instruction Following Self-Supervised Learning

Paper
Add Code

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

1 code implementation • CVPR 2023 • Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them.

Paper
Code

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

no code implementations • CVPR 2023 • Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.

Paper
Add Code

DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics

no code implementations • 27 Mar 2023 • Sizhe Li, Zhiao Huang, Tao Chen, Tao Du, Hao Su, Joshua B. Tenenbaum, Chuang Gan

Reinforcement learning approaches for dexterous rigid object manipulation would struggle in this setting due to the complexity of physics interaction with deformable objects.

Deformable Object Manipulation Object

Paper
Add Code

3D Concept Learning and Reasoning from Multi-View Images

no code implementations • CVPR 2023 • Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.

Question Answering Visual Question Answering +1

Paper
Add Code

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments

no code implementations • 16 Mar 2023 • Tsun-Hsuan Wang, Pingchuan Ma, Andrew Everett Spielberg, Zhou Xian, Hao Zhang, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan

Existing work has typically been tailored for particular environments or representations.

Paper
Add Code

PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification

no code implementations • 9 Mar 2023 • Xuan Li, Yi-Ling Qiao, Peter Yichen Chen, Krishna Murthy Jatavallabhula, Ming Lin, Chenfanfu Jiang, Chuang Gan

In this work, we aim to identify parameters characterizing a physical system from a set of multi-view videos without any assumption on object geometry or topology.

Neural Rendering Object

Paper
Add Code

Planning with Large Language Models for Code Generation

no code implementations • 9 Mar 2023 • Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, Chuang Gan

Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process.

Code Generation Language Modelling +1

Paper
Add Code

FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation

1 code implementation • 4 Mar 2023 • Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan

We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform.

Benchmarking

133

Paper
Code

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning

no code implementations • 12 Jan 2023 • Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Hao Zhang, Chuang Gan

The see stage scans the image and grounds the visual concept candidates with a visual perception model.

Few-Shot Learning Image Captioning +4

Paper
Add Code

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

no code implementations • CVPR 2023 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik G. Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction

no code implementations • ICCV 2023 • Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han

Without performance loss on Cityscapes, our EfficientViT provides up to 8. 8x and 3. 8x GPU latency reduction over SegFormer and SegNeXt, respectively.

Autonomous Driving Super-Resolution

Paper
Add Code

EC2: Emergent Communication for Embodied Control

no code implementations • CVPR 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan

Contrastive Learning Language Modelling

Paper
Add Code

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

no code implementations • 15 Dec 2022 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

CLAWSAT: Towards Both Robust and Accurate Code Models

1 code implementation • 21 Nov 2022 • Jinghan Jia, Shashank Srikant, Tamara Mitrovska, Chuang Gan, Shiyu Chang, Sijia Liu, Una-May O'Reilly

We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models.

Code Generation Code Summarization +2

Paper
Code

Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation

no code implementations • 27 Oct 2022 • Xingyu Lin, Carl Qi, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held

Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels.

Deformable Object Manipulation

Paper
Add Code

JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions

1 code implementation • 18 Oct 2022 • Mo Yu, Yi Gu, Xiaoxiao Guo, Yufei Feng, Xiaodan Zhu, Michael Greenspan, Murray Campbell, Chuang Gan

Hence, in order to achieve higher performance on our tasks, models need to effectively utilize such functional knowledge to infer the outcomes of actions, rather than relying solely on memorizing facts.

Reading Comprehension

Paper
Code

Revisiting the Roles of "Text" in Text Games

no code implementations • 15 Oct 2022 • Yi Gu, Shunyu Yao, Chuang Gan, Joshua B. Tenenbaum, Mo Yu

Text games present opportunities for natural language understanding (NLU) methods to tackle reinforcement learning (RL) challenges.

Natural Language Understanding Passage Retrieval +2

Paper
Add Code

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

1 code implementation • 14 Oct 2022 • Peihao Chen, Dongyu Ji, Kunyang Lin, Runhao Zeng, Thomas H. Li, Mingkui Tan, Chuang Gan

To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.

Navigate Vision and Language Navigation

Paper
Code

Learning Active Camera for Multi-Object Navigation

no code implementations • 14 Oct 2022 • Peihao Chen, Dongyu Ji, Kunyang Lin, Weiwen Hu, Wenbing Huang, Thomas H. Li, Mingkui Tan, Chuang Gan

How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics.

Navigate Object

Paper
Add Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

Learning Physical Dynamics with Subequivariant Graph Neural Networks

no code implementations • 13 Oct 2022 • Jiaqi Han, Wenbing Huang, Hengbo Ma, Jiachen Li, Joshua B. Tenenbaum, Chuang Gan

Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics.

Inductive Bias

Paper
Add Code

Masked Motion Encoding for Self-Supervised Video Representation Learning

2 code implementations • CVPR 2023 • Xinyu Sun, Peihao Chen, LiangWei Chen, Changhao Li, Thomas H. Li, Mingkui Tan, Chuang Gan

The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.

Ranked #2 on Self-Supervised Action Recognition on HMDB51

Optical Flow Estimation Representation Learning +2

Paper
Code

On the Forward Invariance of Neural ODEs

no code implementations • 10 Oct 2022 • Wei Xiao, Tsun-Hsuan Wang, Ramin Hasani, Mathias Lechner, Yutong Ban, Chuang Gan, Daniela Rus

We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation.

Autonomous Vehicles Collision Avoidance +2

Paper
Add Code

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning

no code implementations • 17 Sep 2022 • Kefan Su, Siyuan Zhou, Jiechuan Jiang, Chuang Gan, Xiangjun Wang, Zongqing Lu

Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning Q-Learning +2

Paper
Add Code

Gait Recognition in the Wild with Multi-hop Temporal Switch

1 code implementation • 1 Sep 2022 • Jinkai Zheng, Xinchen Liu, Xiaoyan Gu, Yaoqi Sun, Chuang Gan, Jiyong Zhang, Wu Liu, Chenggang Yan

Current methods that obtain state-of-the-art performance on in-the-lab benchmarks achieve much worse accuracy on the recently proposed in-the-wild datasets because these methods can hardly model the varied temporal dynamics of gait sequences in unconstrained scenes.

Gait Recognition in the Wild

121

Paper
Code

Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation

1 code implementation • 22 Jul 2022 • Hongbin Lin, Yifan Zhang, Zhen Qiu, Shuaicheng Niu, Chuang Gan, Yanxia Liu, Mingkui Tan

2) Prototype-based alignment and replay: based on the identified label prototypes, we align both domains and enforce the model to retain previous knowledge.

Unsupervised Domain Adaptation

Paper
Code

3D Concept Grounding on Neural Fields

no code implementations • 13 Jul 2022 • Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan

Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.

Instance Segmentation Question Answering +3

Paper
Add Code

Finding Fallen Objects Via Asynchronous Audio-Visual Integration

no code implementations • CVPR 2022 • Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh Mcdermott, Antonio Torralba

The way an object looks and sounds provide complementary reflections of its physical properties.

Imitation Learning Object +1

Paper
Add Code

Weakly Supervised Grounding for VQA in Vision-Language Transformers

1 code implementation • 5 Jul 2022 • Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah

Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.

Question Answering Representation Learning +1

Paper
Code

On-Device Training Under 256KB Memory

1 code implementation • 30 Jun 2022 • Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors.

Quantization Transfer Learning

397

Paper
Code

Prompting Decision Transformer for Few-Shot Policy Generalization

no code implementations • 27 Jun 2022 • Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan

Humans can leverage prior experience and learn novel tasks from a handful of demonstrations.

Few-Shot Learning Inductive Bias +2

Paper
Add Code

SNAKE: Shape-aware Neural 3D Keypoint Field

1 code implementation • 3 Jun 2022 • Chengliang Zhong, Peixing You, Xiaoxue Chen, Hao Zhao, Fuchun Sun, Guyue Zhou, Xiaodong Mu, Chuang Gan, Wenbing Huang

Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?

Keypoint Detection

208

Paper
Code

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

5 code implementations • 29 May 2022 • Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han

Without performance loss on Cityscapes, our EfficientViT provides up to 13. 9$\times$ and 6. 2$\times$ GPU latency reduction over SegFormer and SegNeXt, respectively.

Ranked #23 on Semantic Segmentation on Cityscapes val

Autonomous Driving Image Classification +7

29,671

Paper
Code

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

no code implementations • ICLR 2022 • Pingchuan Ma, Tao Du, Joshua B. Tenenbaum, Wojciech Matusik, Chuang Gan

To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering.

Imitation Learning

Paper
Add Code

Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

no code implementations • ICLR 2022 • Sizhe Li, Zhiao Huang, Tao Du, Hao Su, Joshua B. Tenenbaum, Chuang Gan

Extensive experimental results suggest that: 1) on multi-stage tasks that are infeasible for the vanilla differentiable physics solver, our approach discovers contact points that efficiently guide the solver to completion; 2) on tasks where the vanilla solver performs sub-optimally or near-optimally, our contact point discovery method performs better than or on par with the manipulation performance obtained with handcrafted contact points.

Paper
Add Code

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction

no code implementations • CVPR 2022 • Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix.

Paper
Add Code

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos

no code implementations • ICLR 2022 • Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset.

Paper
Add Code

Learning Neural Acoustic Fields

1 code implementation • 4 Apr 2022 • Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.

113

Paper
Code

DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools

no code implementations • ICLR 2022 • Xingyu Lin, Zhiao Huang, Yunzhu Li, Joshua B. Tenenbaum, David Held, Chuang Gan

We consider the problem of sequential robotic manipulation of deformable objects using tools.

Deformable Object Manipulation Object +2

Paper
Add Code

FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations

no code implementations • ICLR 2022 • Lingjie Mei, Jiayuan Mao, Ziqi Wang, Chuang Gan, Joshua B. Tenenbaum

We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts.

Meta-Learning Novel Concepts +1

Paper
Add Code

Linking Emergent and Natural Languages via Corpus Transfer

1 code implementation • ICLR 2022 • Shunyu Yao, Mo Yu, Yang Zhang, Karthik R Narasimhan, Joshua B. Tenenbaum, Chuang Gan

In this work, we propose a novel way to establish such a link by corpus transfer, i. e. pretraining on a corpus of emergent language for downstream natural language tasks, which is in contrast to prior work that directly transfers speaker and listener parameters.

Attribute Disentanglement +2

Paper
Code

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

1 code implementation • CVPR 2022 • Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Li Yi

To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered.

3D Part Segmentation Domain Generalization +1

Paper
Code

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

no code implementations • NeurIPS 2021 • Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.

Instance Segmentation Object +2

Paper
Add Code

STAR: A Benchmark for Situated Reasoning in Real-World Videos

1 code implementation • NeurIPS 2021 • Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR).

Logical Reasoning Question Answering

Paper
Code

Memory-efficient Patch-based Inference for Tiny Deep Learning

no code implementations • NeurIPS 2021 • Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +3

Paper
Add Code

Graph Convolutional Module for Temporal Action Localization in Videos

no code implementations • 1 Dec 2021 • Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms.

Ranked #2 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)

Action Recognition Temporal Action Localization

Paper
Add Code

When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

2 code implementations • NeurIPS 2021 • Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, Chuang Gan

We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency.

Adversarial Robustness Contrastive Learning +1

Paper
Code

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

no code implementations • NeurIPS 2021 • Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.

counterfactual Visual Reasoning

Paper
Add Code

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

1 code implementation • 28 Oct 2021 • Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +3

397

Paper
Code

Counterfactual Debiasing Inference for Compositional Action Recognition

1 code implementation • ACM International Conference on Multimedia 2021 • Pengzhan Sun, Bo Wu, Xunsong Li, Wen Li, Lixin Duan, Chuang Gan

By doing that, our proposed CDN method can better recognize unseen action instances by debiasing the effect of appearances.

Action Recognition counterfactual +1

Paper
Code

Network Augmentation for Tiny Deep Learning

no code implementations • ICLR 2022 • Han Cai, Chuang Gan, Ji Lin, Song Han

We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.

Data Augmentation Image Classification +2

Paper
Add Code

OPEn: An Open-ended Physics Environment for Learning Without a Task

1 code implementation • 13 Oct 2021 • Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola

We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.

Contrastive Learning Representation Learning

Paper
Code

Inducing Reusable Skills From Demonstrations with Option-Controller Network

no code implementations • 29 Sep 2021 • Siyuan Zhou, Yikang Shen, Yuchen Lu, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

With the isolation of information and the synchronous calling mechanism, we can impose a division of works between the controller and options in an end-to-end training regime.

Paper
Add Code

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device

4 code implementations • 27 Sep 2021 • Ji Lin, Chuang Gan, Kuan Wang, Song Han

Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8.

Video Recognition Video Understanding

2,015

Paper
Code

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

1 code implementation • 2 Aug 2021 • Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu

By fine-tuning the models on a number of commonly used remote sensing datasets, we show that our approach outperforms existing pre-training strategies for remote sensing imagery.

Ranked #2 on Cross-Modal Retrieval on SoundingEarth

Cross-Modal Retrieval Representation Learning +1

Paper
Code

Certifiably Robust Interpretation via Renyi Differential Privacy

no code implementations • 4 Jul 2021 • Ao Liu, Xiaoyu Chen, Sijia Liu, Lirong Xia, Chuang Gan

The advantages of our Renyi-Robust-Smooth (RDP-based interpretation method) are three-folds.

Computational Efficiency

Paper
Add Code

Global Rhythm Style Transfer Without Text Transcriptions

1 code implementation • 16 Jun 2021 • Kaizhi Qian, Yang Zhang, Shiyu Chang, JinJun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson

In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions.

Representation Learning Style Transfer

249

Paper
Code

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning

no code implementations • 13 Jun 2021 • Shaobo Min, Qi Dai, Hongtao Xie, Chuang Gan, Yongdong Zhang, Jingdong Wang

Cross-modal correlation provides an inherent supervision for video unsupervised representation learning.

Contrastive Learning Representation Learning

Paper
Add Code

Temporal and Object Quantification Networks

no code implementations • 10 Jun 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman

We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.

Object Temporal Sequences

Paper
Add Code

Adversarial Option-Aware Hierarchical Imitation Learning

1 code implementation • 10 Jun 2021 • Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei LI

In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent.

Imitation Learning

Paper
Code

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

1 code implementation • CVPR 2021 • Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.

Question Answering Visual Question Answering

Paper
Code

Curious Representation Learning for Embodied Intelligence

1 code implementation • ICCV 2021 • Yilun Du, Chuang Gan, Phillip Isola

Instead, it must explore its environment to acquire the data it will learn from.

Representation Learning

Paper
Code

PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics

1 code implementation • ICLR 2021 • Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B. Tenenbaum, Chuang Gan

Experimental results suggest that 1) RL-based approaches struggle to solve most of the tasks efficiently; 2) gradient-based approaches, by optimizing open-loop control sequences with the built-in differentiable physics engine, can rapidly find a solution within tens of iterations, but still fall short on multi-stage tasks that require long-term planning.

Reinforcement Learning (RL)

124

Paper
Code

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

no code implementations • 30 Mar 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan

We study the problem of dynamic visual reasoning on raw videos.

counterfactual Object +3

Paper
Add Code

TransCenter: Transformers with Dense Representations for Multiple-Object Tracking

2 code implementations • 28 Mar 2021 • Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda

Methodologically, we propose the use of image-related dense detection queries and efficient sparse tracking queries produced by our carefully designed query learning networks (QLN).

Ranked #13 on Multi-Object Tracking on MOT20 (using extra training data)

Image Classification Multi-Object Tracking +4

106

Paper
Code

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI

1 code implementation • 25 Mar 2021 • Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund, Daniel L. K. Yamins, James J DiCarlo, Josh Mcdermott, Antonio Torralba, Joshua B. Tenenbaum

To complete the task, an embodied agent must plan a sequence of actions to change the state of a large number of objects in the face of realistic physical constraints.

Motion Planning Task and Motion Planning

164

Paper
Code

Learning Task Decomposition with Ordered Memory Policy Network

no code implementations • 19 Mar 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration.

Inductive Bias

Paper
Add Code

AGENT: A Benchmark for Core Psychological Reasoning

no code implementations • 24 Feb 2021 • Tianmin Shu, Abhishek Bhandwaldar, Chuang Gan, Kevin A. Smith, Shari Liu, Dan Gutfreund, Elizabeth Spelke, Joshua B. Tenenbaum, Tomer D. Ullman

For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life.

Ranked #1 on Core Psychological Reasoning on AGENT

Core Psychological Reasoning

Paper
Add Code

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning

1 code implementation • ICLR 2021 • Ren Wang, Kaidi Xu, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Chuang Gan, Meng Wang

Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning.

Adversarial Attack Adversarial Robustness +3

Paper
Code

Temporal and Object Quantification Nets

no code implementations • 1 Jan 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer Ullman

We aim to learn generalizable representations for complex activities by quantifying over both entities and time, as in “the kicker is behind all the other players,” or “the player controls the ball until it moves toward the goal.” Such a structural inductive bias of object relations, object quantification, and temporal orders will enable the learned representation to generalize to situations with varying numbers of agents, objects, and time courses.

Event Detection Inductive Bias +1

Paper
Add Code

Learning Task Decomposition with Order-Memory Policy Network

no code implementations • ICLR 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

Many complex real-world tasks are composed of several levels of sub-tasks.

Imitation Learning Inductive Bias

Paper
Add Code

Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning

no code implementations • ICLR 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan

We study the problem of dynamic visual reasoning on raw videos.

counterfactual Object +3

Paper
Add Code

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

2 code implementations • 23 Dec 2020 • Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum

Humans can abstract prior knowledge from very little data and use it to boost skill learning.

Atari Games Imitation Learning +2

Paper
Code

Object-Centric Diagnosis of Visual Reasoning

no code implementations • 21 Dec 2020 • Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan

In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.

Object Question Answering +2

Paper
Add Code

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

3 code implementations • 13 Dec 2020 • Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding

Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.

Ranked #33 on Action Recognition on Something-Something V1

Action Classification Action Recognition +2

140

Paper
Code

Synthetic Training for Monocular Human Mesh Recovery

no code implementations • 27 Oct 2020 • Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei

To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available.

Computational Efficiency Human Mesh Recovery

Paper
Add Code

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

1 code implementation • 27 Oct 2020 • Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition.

Ranked #11 on Self-Supervised Action Recognition on UCF101

Representation Learning Retrieval +2

Paper
Code

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning

1 code implementation • EMNLP 2020 • Xiaoxiao Guo, Mo Yu, Yupeng Gao, Chuang Gan, Murray Campbell, Shiyu Chang

Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques.

Reading Comprehension reinforcement-learning +4

Paper
Code

Location-aware Graph Convolutional Networks for Video Question Answering

1 code implementation • 7 Aug 2020 • Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan

In this work, we propose to represent the contents in the video as a location-aware graph by incorporating the location information of an object into the graph construction.

Action Recognition graph construction +3

Paper
Code

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events

no code implementations • 27 Jul 2020 • Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum

Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.

Atari Games Reinforcement Learning (RL)

Paper
Add Code

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

1 code implementation • NeurIPS 2020 • Han Cai, Chuang Gan, Ligeng Zhu, Song Han

Furthermore, combined with feature extractor adaptation, TinyTL provides 7. 3-12. 9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

Transfer Learning

712

Paper
Code

Foley Music: Learning to Generate Music from Videos

no code implementations • ECCV 2020 • Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.

Music Generation Translation

Paper
Add Code

MCUNet: Tiny Deep Learning on IoT Devices

1 code implementation • NeurIPS 2020 • Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.

BIG-bench Machine Learning Neural Architecture Search +1

397

Paper
Code

Generating Visually Aligned Sound from Videos

1 code implementation • 14 Jul 2020 • Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan

During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features.

Paper
Code

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

1 code implementation • 9 Jul 2020 • Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins

We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation.

Scene Understanding

460

Paper
Code

Language Guided Networks for Cross-modal Moment Retrieval

no code implementations • 18 Jun 2020 • Kun Liu, Huadong Ma, Chuang Gan

In this paper, we present Language Guided Networks (LGN), a new framework that leverages the sentence embedding to guide the whole process of moment retrieval.

Moment Retrieval Retrieval +3

Paper
Add Code

A Real-time Action Representation with Temporal Encoding and Deep Compression

no code implementations • 17 Jun 2020 • Kun Liu, Wu Liu, Huadong Ma, Mingkui Tan, Chuang Gan

Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5. 4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.

Action Recognition

Paper
Add Code

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

4 code implementations • ACL 2020 • Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han

To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search.

Ranked #21 on Machine Translation on WMT2014 English-French

Machine Translation Neural Architecture Search +1

322

Paper
Code

Deep Audio Priors Emerge From Harmonic Convolutional Networks

no code implementations • ICLR 2020 • Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.

Paper
Add Code

Once for All: Train One Network and Specialize it for Efficient Deployment

1 code implementation • ICLR 2020 • Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable.

Neural Architecture Search

1,836

Paper
Code

Music Gesture for Visual Sound Separation

no code implementations • CVPR 2020 • Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba

Recent deep learning approaches have achieved impressive performance on visual sound separation tasks.

Optical Flow Estimation

Paper
Add Code

Dense Regression Network for Video Grounding

1 code implementation • CVPR 2020 • Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan

The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy.

Ranked #6 on Natural Language Moment Retrieval on ActivityNet Captions

Natural Language Moment Retrieval Natural Language Queries +2

Paper
Code

Visual Concept-Metaconcept Learning

1 code implementation • NeurIPS 2019 • Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu

Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i. e., the color).

Paper
Code

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

1 code implementation • 25 Dec 2019 • Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.

Navigate

Paper
Code

Cross-channel Communication Networks

1 code implementation • NeurIPS 2019 • Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers.

Paper
Code

Self-supervised Moving Vehicle Tracking with Stereo Sound

no code implementations • ICCV 2019 • Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.

Object Localization Visual Localization

Paper
Add Code

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation

no code implementations • 14 Oct 2019 • Fan Yang, Xiao Liu, Dongliang He, Chuang Gan, Jian Wang, Chao Li, Fu Li, Shilei Wen

In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story.

Highlight Detection Video Summarization

Paper
Add Code

Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

no code implementations • NeurIPS 2019 • Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu, Junzhou Huang, Chuang Gan

This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations.

Imitation Learning

Paper
Add Code

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

3 code implementations • ICLR 2020 • Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

counterfactual Descriptive +1

102

Paper
Code

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

no code implementations • 1 Oct 2019 • Ji Lin, Chuang Gan, Song Han

With such hardware-aware model design, we are able to scale up the training on Summit supercomputer and reduce the training time on Kinetics dataset from 49 hours 55 minutes to 14 minutes 13 seconds, achieving a top-1 accuracy of 74. 0%, which is 1. 6x and 2. 9x faster than previous 3D video models with higher accuracy.

Video Recognition

Paper
Add Code

Graph Convolutional Networks for Temporal Action Localization

1 code implementation • ICCV 2019 • Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization.

Ranked #4 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)

Action Classification Temporal Action Localization

321

Paper
Code

Deep Concept-wise Temporal Convolutional Networks for Action Localization

2 code implementations • 26 Aug 2019 • Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, WangMeng Zuo, Chao Li, Xiang Long, Dongliang He, Fu Li, Shilei Wen

In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution.

Action Classification Action Localization

6,867

Paper
Code

Once-for-All: Train One Network and Specialize it for Efficient Deployment

10 code implementations • 26 Aug 2019 • Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x faster than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission.

Ranked #76 on Neural Architecture Search on ImageNet

Neural Architecture Search

1,836

Paper
Code

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

2 code implementations • ICLR 2019 • Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu

To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation.

Ranked #5 on Visual Question Answering (VQA) on CLEVR

Object Question Answering +4

407

Paper
Code

Self-Supervised Audio-Visual Co-Segmentation

no code implementations • 18 Apr 2019 • Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.

Image Segmentation Segmentation +1

Paper
Add Code

Defensive Quantization: When Efficiency Meets Robustness

no code implementations • ICLR 2019 • Ji Lin, Chuang Gan, Song Han

This paper aims to raise people's awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.

Adversarial Attack Quantization

Paper
Add Code

The Sound of Motions

1 code implementation • ICCV 2019 • Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

Sounds originate from object motions and vibrations of surrounding air.

Paper
Code

Interpreting Adversarial Examples by Activation Promotion and Suppression

no code implementations • 3 Apr 2019 • Kaidi Xu, Sijia Liu, Gaoyuan Zhang, Mengshu Sun, Pu Zhao, Quanfu Fan, Chuang Gan, Xue Lin

It is widely known that convolutional neural networks (CNNs) are vulnerable to adversarial examples: images with imperceptible perturbations crafted to fool classifiers.

Adversarial Robustness

Paper
Add Code

Weakly Supervised Dense Event Captioning in Videos

no code implementations • NeurIPS 2018 • Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang

Dense event captioning aims to detect and describe all events of interest contained in a video.

Sentence

Paper
Add Code

TSM: Temporal Shift Module for Efficient Video Understanding

13 code implementations • ICCV 2019 • Ji Lin, Chuang Gan, Song Han

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Ranked #2 on 3D Action Recognition on Assembly101

3D Action Recognition Action Classification +6

3,876

Paper
Code

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

8 code implementations • 5 Nov 2018 • Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

Action Recognition Temporal Action Localization

334

Paper
Code

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

2 code implementations • NeurIPS 2018 • Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.

Ranked #1 on Visual Question Answering (VQA) on CLEVR

Question Answering Representation Learning +1

253

Paper
Code

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation

no code implementations • 9 Aug 2018 • Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong

The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip.

Facial expression generation Image-to-Image Translation +2

Paper
Add Code

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning

no code implementations • CVPR 2018 • Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas

In addition, we also find that a progressive training strategy can foster a better neural network for the video recognition task than blindly pooling the distinct sources of geometry cues together.

Action Recognition Representation Learning +5

Paper
Add Code

Sparse, Smart Contours to Represent and Edit Images

no code implementations • CVPR 2018 • Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman

We study the problem of reconstructing an image from information stored at contour locations.

Face Recognition Image Manipulation

Paper
Add Code

The Sound of Pixels

2 code implementations • ECCV 2018 • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

362

Paper
Code

End-to-End Learning of Motion Representation for Video Understanding

1 code implementation • CVPR 2018 • Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang

Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks.

Ranked #42 on Action Recognition on UCF101

Action Recognition Optical Flow Estimation +1

290

Paper
Code

Smart, Sparse Contours to Represent and Edit Images

no code implementations • 21 Dec 2017 • Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman

We study the problem of reconstructing an image from information stored at contour locations.

Face Recognition Image Manipulation

Paper
Add Code

Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

1 code implementation • ECCV 2018 • Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Luo, Qi-Xing Huang

In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image.

Keypoint Estimation Unsupervised Domain Adaptation

Paper
Code

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

5 code implementations • CVPR 2018 • Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen

In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.

Classification General Classification +1

334

Paper
Code

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

1 code implementation • ICCV 2017 • Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

Many seemingly distant annotations (e. g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e. g., of COCO).

Language Modelling Multiple-choice +4

Paper
Code

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

no code implementations • 12 Aug 2017 • Yunlong Bian, Chuang Gan, Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie zhou, Shilei Wen, Yuanqing Lin

Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks.

Ranked #163 on Action Classification on Kinetics-400

Action Classification General Classification +2

Paper
Add Code

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

1 code implementation • 14 Jul 2017 • Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie zhou, Shilei Wen

This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place.

Video Recognition Video Understanding

113

Paper
Code

StyleNet: Generating Attractive Visual Captions With Styles

no code implementations • CVPR 2017 • Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng

We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles.

Caption Generation

Paper
Add Code

Recurrent Topic-Transition GAN for Visual Paragraph Generation

no code implementations • ICCV 2017 • Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators.

Ranked #6 on Image Paragraph Captioning on Image Paragraph Captioning

Generative Adversarial Network Image Paragraph Captioning +1

Paper
Add Code

Video Captioning with Multi-Faceted Attention

no code implementations • TACL 2018 • Xiang Long, Chuang Gan, Gerard de Melo

Recently, video captioning has been attracting an increasing amount of interest, due to its potential for improving accessibility and information retrieval.

Information Retrieval Retrieval +2

Paper
Add Code

Semantic Compositional Networks for Visual Captioning

1 code implementation • CVPR 2017 • Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng

The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag.

Image Captioning Semantic Composition +1

Paper
Code

Strategies for Searching Video Content with Text Queries or Video Examples

no code implementations • 17 Jun 2016 • Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.

Event Detection Retrieval +1

Paper
Add Code

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

no code implementations • CVPR 2016 • Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei

The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.

Action Recognition Event Detection +1

Paper
Add Code

Learning Attributes Equals Multi-Source Domain Generalization

no code implementations • CVPR 2016 • Chuang Gan, Tianbao Yang, Boqing Gong

Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval.

Attribute Domain Generalization +3

Paper
Add Code

Automatic Concept Discovery from Parallel Text and Visual Corpora

no code implementations • ICCV 2015 • Chen Sun, Chuang Gan, Ram Nevatia

Humans connect language and vision to perceive the world.

Retrieval Sentence

Paper
Add Code

DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting

no code implementations • CVPR 2015 • Chuang Gan, Naiyan Wang, Yi Yang, Dit-yan Yeung, Alex G. Hauptmann

Taking key frames of videos as input, we first detect the event of interest at the video level by aggregating the CNN features of the key frames.

Action Recognition Event Detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.