no code implementations • ECCV 2020 • Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu, Song Han
Third, our solution is extit{efficient} on the edge since the majority of the workload is delegated to the cloud, and our mixing and de-mixing processes introduce very few extra computations.
no code implementations • 15 Aug 2023 • Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan
We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet challenging problem in which an agent learns to navigate following a path described by language instructions without requiring any path-instruction annotation data.
no code implementations • 24 Jul 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.
1 code implementation • 22 Jul 2023 • Kunyang Lin, Peihao Chen, Diwei Huang, Thomas H. Li, Mingkui Tan, Chuang Gan
In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it.
no code implementations • 20 Jul 2023 • Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency.
no code implementations • 5 Jul 2023 • Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan
Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains.
no code implementations • 29 Jun 2023 • Zitian Chen, Mingyu Ding, Yikang Shen, Wei Zhan, Masayoshi Tomizuka, Erik Learned-Miller, Chuang Gan
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
no code implementations • 27 Jun 2023 • Hsiao-Yu Tung, Mingyu Ding, Zhenfang Chen, Daniel Bear, Chuang Gan, Joshua B. Tenenbaum, Daniel LK Yamins, Judith E Fan, Kevin A. Smith
Specifically, we test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability, and where the values of those properties can only be inferred by observing how objects move and interact with other objects or fluids.
1 code implementation • 7 Jun 2023 • Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan
In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.
no code implementations • 31 May 2023 • Wei Xiao, Tsun-Hsuan Wang, Chuang Gan, Daniela Rus
Diffusion model-based approaches have shown promise in data-driven planning, but there are no safety guarantees, thus making it hard to be applied for safety-critical applications.
1 code implementation • 4 May 2023 • Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable.
no code implementations • 27 Apr 2023 • Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B. Tenenbaum, Tao Du, Chuang Gan, Wojciech Matusik
Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models).
no code implementations • 19 Apr 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan
We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.
1 code implementation • CVPR 2023 • Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah
The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.
no code implementations • 17 Apr 2023 • Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan
To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner.
no code implementations • 7 Apr 2023 • Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging the learned concepts; and (iv) a program executor with deterministic policies to execute each program.
1 code implementation • CVPR 2023 • Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them.
no code implementations • CVPR 2023 • Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.
no code implementations • 27 Mar 2023 • Sizhe Li, Zhiao Huang, Tao Chen, Tao Du, Hao Su, Joshua B. Tenenbaum, Chuang Gan
Reinforcement learning approaches for dexterous rigid object manipulation would struggle in this setting due to the complexity of physics interaction with deformable objects.
no code implementations • CVPR 2023 • Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan
We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.
no code implementations • 16 Mar 2023 • Tsun-Hsuan Wang, Pingchuan Ma, Andrew Everett Spielberg, Zhou Xian, Hao Zhang, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan
Existing work has typically been tailored for particular environments or representations.
no code implementations • 9 Mar 2023 • Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, Chuang Gan
Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process.
no code implementations • 9 Mar 2023 • Xuan Li, Yi-Ling Qiao, Peter Yichen Chen, Krishna Murthy Jatavallabhula, Ming Lin, Chenfanfu Jiang, Chuang Gan
In this work, we aim to identify parameters characterizing a physical system from a set of multi-view videos without any assumption on object geometry or topology.
1 code implementation • 4 Mar 2023 • Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan
We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform.
no code implementations • 12 Jan 2023 • Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Hao Zhang, Chuang Gan
The see stage scans the image and grounds the visual concept candidates with a visual perception model.
no code implementations • CVPR 2023 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik G. Learned-Miller, Chuang Gan
To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').
no code implementations • CVPR 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan
We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.
no code implementations • 15 Dec 2022 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik Learned-Miller, Chuang Gan
To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').
1 code implementation • 21 Nov 2022 • Jinghan Jia, Shashank Srikant, Tamara Mitrovska, Chuang Gan, Shiyu Chang, Sijia Liu, Una-May O'Reilly
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models.
no code implementations • 27 Oct 2022 • Xingyu Lin, Carl Qi, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held
Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels.
1 code implementation • 18 Oct 2022 • Mo Yu, Yi Gu, Xiaoxiao Guo, Yufei Feng, Xiaodan Zhu, Michael Greenspan, Murray Campbell, Chuang Gan
Hence, in order to achieve higher performance on our tasks, models need to effectively utilize such functional knowledge to infer the outcomes of actions, rather than relying solely on memorizing facts.
no code implementations • 15 Oct 2022 • Yi Gu, Shunyu Yao, Chuang Gan, Joshua B. Tenenbaum, Mo Yu
Text games present opportunities for natural language understanding (NLU) methods to tackle reinforcement learning (RL) challenges.
1 code implementation • 14 Oct 2022 • Peihao Chen, Dongyu Ji, Kunyang Lin, Runhao Zeng, Thomas H. Li, Mingkui Tan, Chuang Gan
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
no code implementations • 14 Oct 2022 • Peihao Chen, Dongyu Ji, Kunyang Lin, Weiwen Hu, Wenbing Huang, Thomas H. Li, Mingkui Tan, Chuang Gan
How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics.
no code implementations • 13 Oct 2022 • Jiaqi Han, Wenbing Huang, Hengbo Ma, Jiachen Li, Joshua B. Tenenbaum, Chuang Gan
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics.
no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu
We present a retrospective on the state of Embodied AI research.
2 code implementations • CVPR 2023 • Xinyu Sun, Peihao Chen, LiangWei Chen, Changhao Li, Thomas H. Li, Mingkui Tan, Chuang Gan
The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.
Ranked #2 on
Self-Supervised Action Recognition
on HMDB51
no code implementations • 10 Oct 2022 • Wei Xiao, Tsun-Hsuan Wang, Ramin Hasani, Mathias Lechner, Yutong Ban, Chuang Gan, Daniela Rus
We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation.
no code implementations • 17 Sep 2022 • Kefan Su, Siyuan Zhou, Jiechuan Jiang, Chuang Gan, Xiangjun Wang, Zongqing Lu
Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL).
1 code implementation • 1 Sep 2022 • Jinkai Zheng, Xinchen Liu, Xiaoyan Gu, Yaoqi Sun, Chuang Gan, Jiyong Zhang, Wu Liu, Chenggang Yan
Current methods that obtain state-of-the-art performance on in-the-lab benchmarks achieve much worse accuracy on the recently proposed in-the-wild datasets because these methods can hardly model the varied temporal dynamics of gait sequences in unconstrained scenes.
1 code implementation • 22 Jul 2022 • Hongbin Lin, Yifan Zhang, Zhen Qiu, Shuaicheng Niu, Chuang Gan, Yanxia Liu, Mingkui Tan
2) Prototype-based alignment and replay: based on the identified label prototypes, we align both domains and enforce the model to retain previous knowledge.
no code implementations • 13 Jul 2022 • Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan
Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.
no code implementations • CVPR 2022 • Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh Mcdermott, Antonio Torralba
The way an object looks and sounds provide complementary reflections of its physical properties.
1 code implementation • 5 Jul 2022 • Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah
Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.
1 code implementation • 30 Jun 2022 • Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han
To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors.
no code implementations • 27 Jun 2022 • Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
Humans can leverage prior experience and learn novel tasks from a handful of demonstrations.
1 code implementation • 3 Jun 2022 • Chengliang Zhong, Peixing You, Xiaoxue Chen, Hao Zhao, Fuchun Sun, Guyue Zhou, Xiaodong Mu, Chuang Gan, Wenbing Huang
Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?
3 code implementations • 29 May 2022 • Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han
Unlike prior semantic segmentation models that rely on heavy self-attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our lightweight multi-scale attention achieves a global receptive field and multi-scale learning (two critical features for semantic segmentation models) with only lightweight and hardware-efficient operations.
Ranked #19 on
Semantic Segmentation
on Cityscapes val
no code implementations • ICLR 2022 • Pingchuan Ma, Tao Du, Joshua B. Tenenbaum, Wojciech Matusik, Chuang Gan
To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering.
no code implementations • ICLR 2022 • Sizhe Li, Zhiao Huang, Tao Du, Hao Su, Joshua B. Tenenbaum, Chuang Gan
Extensive experimental results suggest that: 1) on multi-stage tasks that are infeasible for the vanilla differentiable physics solver, our approach discovers contact points that efficiently guide the solver to completion; 2) on tasks where the vanilla solver performs sub-optimally or near-optimally, our contact point discovery method performs better than or on par with the manipulation performance obtained with handcrafted contact points.
no code implementations • CVPR 2022 • Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan
Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix.
no code implementations • ICLR 2022 • Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan
In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset.
no code implementations • 4 Apr 2022 • Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.
no code implementations • ICLR 2022 • Xingyu Lin, Zhiao Huang, Yunzhu Li, Joshua B. Tenenbaum, David Held, Chuang Gan
We consider the problem of sequential robotic manipulation of deformable objects using tools.
no code implementations • ICLR 2022 • Lingjie Mei, Jiayuan Mao, Ziqi Wang, Chuang Gan, Joshua B. Tenenbaum
We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts.
1 code implementation • ICLR 2022 • Shunyu Yao, Mo Yu, Yang Zhang, Karthik R Narasimhan, Joshua B. Tenenbaum, Chuang Gan
In this work, we propose a novel way to establish such a link by corpus transfer, i. e. pretraining on a corpus of emergent language for downstream natural language tasks, which is in contrast to prior work that directly transfers speaker and listener parameters.
1 code implementation • CVPR 2022 • Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Li Yi
To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered.
no code implementations • NeurIPS 2021 • Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.
1 code implementation • NeurIPS 2021 • Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan
This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR).
no code implementations • NeurIPS 2021 • Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han
We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.
no code implementations • 1 Dec 2021 • Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan
To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms.
Ranked #2 on
Temporal Action Localization
on THUMOS’14
(mAP IOU@0.1 metric)
2 code implementations • NeurIPS 2021 • Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, Chuang Gan
We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency.
no code implementations • NeurIPS 2021 • Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
1 code implementation • 28 Oct 2021 • Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han
We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.
no code implementations • ICLR 2022 • Han Cai, Chuang Gan, Ji Lin, Song Han
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.
1 code implementation • ACM International Conference on Multimedia 2021 • Pengzhan Sun, Bo Wu, Xunsong Li, Wen Li, Lixin Duan, Chuang Gan
By doing that, our proposed CDN method can better recognize unseen action instances by debiasing the effect of appearances.
no code implementations • 13 Oct 2021 • Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola
We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.
no code implementations • 29 Sep 2021 • Siyuan Zhou, Yikang Shen, Yuchen Lu, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan
With the isolation of information and the synchronous calling mechanism, we can impose a division of works between the controller and options in an end-to-end training regime.
2 code implementations • 27 Sep 2021 • Ji Lin, Chuang Gan, Kuan Wang, Song Han
Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8.
1 code implementation • 2 Aug 2021 • Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu
By fine-tuning the models on a number of commonly used remote sensing datasets, we show that our approach outperforms existing pre-training strategies for remote sensing imagery.
no code implementations • 4 Jul 2021 • Ao Liu, Xiaoyu Chen, Sijia Liu, Lirong Xia, Chuang Gan
The advantages of our Renyi-Robust-Smooth (RDP-based interpretation method) are three-folds.
1 code implementation • 16 Jun 2021 • Kaizhi Qian, Yang Zhang, Shiyu Chang, JinJun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson
In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions.
no code implementations • 13 Jun 2021 • Shaobo Min, Qi Dai, Hongtao Xie, Chuang Gan, Yongdong Zhang, Jingdong Wang
Cross-modal correlation provides an inherent supervision for video unsupervised representation learning.
no code implementations • 10 Jun 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman
We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.
1 code implementation • 10 Jun 2021 • Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei LI
In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent.
1 code implementation • CVPR 2021 • Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah
In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.
1 code implementation • ICCV 2021 • Yilun Du, Chuang Gan, Phillip Isola
Instead, it must explore its environment to acquire the data it will learn from.
1 code implementation • ICLR 2021 • Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B. Tenenbaum, Chuang Gan
Experimental results suggest that 1) RL-based approaches struggle to solve most of the tasks efficiently; 2) gradient-based approaches, by optimizing open-loop control sequences with the built-in differentiable physics engine, can rapidly find a solution within tens of iterations, but still fall short on multi-stage tasks that require long-term planning.
no code implementations • 30 Mar 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan
We study the problem of dynamic visual reasoning on raw videos.
2 code implementations • 28 Mar 2021 • Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda
Methodologically, we propose the use of image-related dense detection queries and efficient sparse tracking queries produced by our carefully designed query learning networks (QLN).
Ranked #11 on
Multi-Object Tracking
on MOT20
(using extra training data)
no code implementations • 25 Mar 2021 • Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund, Daniel L. K. Yamins, James J DiCarlo, Josh Mcdermott, Antonio Torralba, Joshua B. Tenenbaum
To complete the task, an embodied agent must plan a sequence of actions to change the state of a large number of objects in the face of realistic physical constraints.
no code implementations • 19 Mar 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan
The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration.
no code implementations • 24 Feb 2021 • Tianmin Shu, Abhishek Bhandwaldar, Chuang Gan, Kevin A. Smith, Shari Liu, Dan Gutfreund, Elizabeth Spelke, Joshua B. Tenenbaum, Tomer D. Ullman
For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life.
Ranked #1 on
Core Psychological Reasoning
on AGENT
1 code implementation • ICLR 2021 • Ren Wang, Kaidi Xu, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Chuang Gan, Meng Wang
Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning.
no code implementations • ICLR 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan
Many complex real-world tasks are composed of several levels of sub-tasks.
no code implementations • ICLR 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan
We study the problem of dynamic visual reasoning on raw videos.
no code implementations • 1 Jan 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer Ullman
We aim to learn generalizable representations for complex activities by quantifying over both entities and time, as in “the kicker is behind all the other players,” or “the player controls the ball until it moves toward the goal.” Such a structural inductive bias of object relations, object quantification, and temporal orders will enable the learned representation to generalize to situations with varying numbers of agents, objects, and time courses.
2 code implementations • 23 Dec 2020 • Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum
Humans can abstract prior knowledge from very little data and use it to boost skill learning.
no code implementations • 21 Dec 2020 • Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan
In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.
3 code implementations • 13 Dec 2020 • Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding
Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.
Ranked #31 on
Action Recognition
on Something-Something V1
1 code implementation • 27 Oct 2020 • Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition.
Ranked #10 on
Self-Supervised Action Recognition
on UCF101
no code implementations • 27 Oct 2020 • Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei
To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available.
1 code implementation • EMNLP 2020 • Xiaoxiao Guo, Mo Yu, Yupeng Gao, Chuang Gan, Murray Campbell, Shiyu Chang
Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques.
1 code implementation • 7 Aug 2020 • Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan
In this work, we propose to represent the contents in the video as a location-aware graph by incorporating the location information of an object into the graph construction.
no code implementations • 27 Jul 2020 • Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum
Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.
1 code implementation • NeurIPS 2020 • Han Cai, Chuang Gan, Ligeng Zhu, Song Han
Furthermore, combined with feature extractor adaptation, TinyTL provides 7. 3-12. 9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.
no code implementations • ECCV 2020 • Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
1 code implementation • NeurIPS 2020 • Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han
Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.
1 code implementation • 14 Jul 2020 • Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan
During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features.
1 code implementation • 9 Jul 2020 • Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins
We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation.
no code implementations • 18 Jun 2020 • Kun Liu, Huadong Ma, Chuang Gan
In this paper, we present Language Guided Networks (LGN), a new framework that leverages the sentence embedding to guide the whole process of moment retrieval.
no code implementations • 17 Jun 2020 • Kun Liu, Wu Liu, Huadong Ma, Mingkui Tan, Chuang Gan
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5. 4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
4 code implementations • ACL 2020 • Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han
To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search.
Ranked #21 on
Machine Translation
on WMT2014 English-French
1 code implementation • ICLR 2020 • Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable.
no code implementations • ICLR 2020 • Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.
no code implementations • CVPR 2020 • Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba
Recent deep learning approaches have achieved impressive performance on visual sound separation tasks.
1 code implementation • CVPR 2020 • Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan
The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy.
Natural Language Moment Retrieval
Natural Language Queries
+2
1 code implementation • NeurIPS 2019 • Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu
Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i. e., the color).
1 code implementation • 25 Dec 2019 • Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum
In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.
1 code implementation • NeurIPS 2019 • Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh
Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers.
no code implementations • ICCV 2019 • Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba
At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.
no code implementations • 14 Oct 2019 • Fan Yang, Xiao Liu, Dongliang He, Chuang Gan, Jian Wang, Chao Li, Fu Li, Shilei Wen
In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story.
no code implementations • NeurIPS 2019 • Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu, Junzhou Huang, Chuang Gan
This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations.
3 code implementations • ICLR 2020 • Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum
While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.
no code implementations • 1 Oct 2019 • Ji Lin, Chuang Gan, Song Han
With such hardware-aware model design, we are able to scale up the training on Summit supercomputer and reduce the training time on Kinetics dataset from 49 hours 55 minutes to 14 minutes 13 seconds, achieving a top-1 accuracy of 74. 0%, which is 1. 6x and 2. 9x faster than previous 3D video models with higher accuracy.
1 code implementation • ICCV 2019 • Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan
Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization.
Ranked #4 on
Temporal Action Localization
on THUMOS’14
(mAP IOU@0.1 metric)
2 code implementations • 26 Aug 2019 • Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, WangMeng Zuo, Chao Li, Xiang Long, Dongliang He, Fu Li, Shilei Wen
In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution.
10 code implementations • 26 Aug 2019 • Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x faster than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission.
Ranked #76 on
Neural Architecture Search
on ImageNet
2 code implementations • ICLR 2019 • Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu
To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation.
Ranked #5 on
Visual Question Answering (VQA)
on CLEVR
no code implementations • 18 Apr 2019 • Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba
Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.
no code implementations • ICLR 2019 • Ji Lin, Chuang Gan, Song Han
This paper aims to raise people's awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.
1 code implementation • ICCV 2019 • Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba
Sounds originate from object motions and vibrations of surrounding air.
no code implementations • 3 Apr 2019 • Kaidi Xu, Sijia Liu, Gaoyuan Zhang, Mengshu Sun, Pu Zhao, Quanfu Fan, Chuang Gan, Xue Lin
It is widely known that convolutional neural networks (CNNs) are vulnerable to adversarial examples: images with imperceptible perturbations crafted to fool classifiers.
no code implementations • NeurIPS 2018 • Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang
Dense event captioning aims to detect and describe all events of interest contained in a video.
12 code implementations • ICCV 2019 • Ji Lin, Chuang Gan, Song Han
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Ranked #22 on
Video Object Detection
on ImageNet VID
6 code implementations • 5 Nov 2018 • Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen
In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.
2 code implementations • NeurIPS 2018 • Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum
Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.
Ranked #1 on
Visual Question Answering (VQA)
on CLEVR
no code implementations • 9 Aug 2018 • Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong
The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip.
no code implementations • CVPR 2018 • Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas
In addition, we also find that a progressive training strategy can foster a better neural network for the video recognition task than blindly pooling the distinct sources of geometry cues together.
no code implementations • CVPR 2018 • Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman
We study the problem of reconstructing an image from information stored at contour locations.
2 code implementations • ECCV 2018 • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.
1 code implementation • CVPR 2018 • Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang
Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks.
Ranked #39 on
Action Recognition
on UCF101
no code implementations • 21 Dec 2017 • Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman
We study the problem of reconstructing an image from information stored at contour locations.
1 code implementation • ECCV 2018 • Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Luo, Qi-Xing Huang
In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image.
3 code implementations • CVPR 2018 • Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen
In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.
1 code implementation • ICCV 2017 • Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong
Many seemingly distant annotations (e. g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e. g., of COCO).
no code implementations • 12 Aug 2017 • Yunlong Bian, Chuang Gan, Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie zhou, Shilei Wen, Yuanqing Lin
Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks.
Ranked #148 on
Action Classification
on Kinetics-400
1 code implementation • 14 Jul 2017 • Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie zhou, Shilei Wen
This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place.
no code implementations • CVPR 2017 • Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng
We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles.
no code implementations • ICCV 2017 • Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators.
no code implementations • TACL 2018 • Xiang Long, Chuang Gan, Gerard de Melo
Recently, video captioning has been attracting an increasing amount of interest, due to its potential for improving accessibility and information retrieval.
1 code implementation • CVPR 2017 • Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng
The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag.
no code implementations • 17 Jun 2016 • Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang
The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.
no code implementations • CVPR 2016 • Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei
The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.
no code implementations • CVPR 2016 • Chuang Gan, Tianbao Yang, Boqing Gong
Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval.
no code implementations • ICCV 2015 • Chen Sun, Chuang Gan, Ram Nevatia
Humans connect language and vision to perceive the world.
no code implementations • CVPR 2015 • Chuang Gan, Naiyan Wang, Yi Yang, Dit-yan Yeung, Alex G. Hauptmann
Taking key frames of videos as input, we first detect the event of interest at the video level by aggregating the CNN features of the key frames.