Search Results for author: Chuang Gan

Found 105 papers, 44 papers with code

DataMix: Efficient Privacy-Preserving Edge-Cloud Inference

no code implementations ECCV 2020 Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu, Song Han

Third, our solution is extit{efficient} on the edge since the majority of the workload is delegated to the cloud, and our mixing and de-mixing processes introduce very few extra computations.

Computer Vision Privacy Preserving +2

On-Device Training Under 256KB Memory

no code implementations30 Jun 2022 Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

Our framework is the first practical solution for on-device transfer learning of visual recognition on tiny IoT devices (e. g., a microcontroller with only 256KB SRAM), using less than 1/100 of the memory of existing frameworks while matching the accuracy of cloud training+edge deployment for the tinyML application VWW.

Quantization Transfer Learning

SNAKE: Shape-aware Neural 3D Keypoint Field

1 code implementation3 Jun 2022 Chengliang Zhong, Peixing You, Xiaoxue Chen, Hao Zhao, Fuchun Sun, Guyue Zhou, Xiaodong Mu, Chuang Gan, Wenbing Huang

Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?

Keypoint Detection

EfficientViT: Enhanced Linear Attention for High-Resolution Low-Computation Visual Recognition

no code implementations29 May 2022 Han Cai, Chuang Gan, Song Han

Existing methods (e. g., Swin, PVT) restrict the softmax attention within local windows or reduce the resolution of key/value tensors to reduce the cost, which sacrifices ViT's core advantages on global feature extractions.

object-detection Object Detection +1

Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

no code implementations ICLR 2022 Sizhe Li, Zhiao Huang, Tao Du, Hao Su, Joshua B. Tenenbaum, Chuang Gan

Extensive experimental results suggest that: 1) on multi-stage tasks that are infeasible for the vanilla differentiable physics solver, our approach discovers contact points that efficiently guide the solver to completion; 2) on tasks where the vanilla solver performs sub-optimally or near-optimally, our contact point discovery method performs better than or on par with the manipulation performance obtained with handcrafted contact points.

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction

no code implementations CVPR 2022 Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix.

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos

no code implementations ICLR 2022 Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset.

Learning Neural Acoustic Fields

no code implementations4 Apr 2022 Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.

FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations

no code implementations ICLR 2022 Lingjie Mei, Jiayuan Mao, Ziqi Wang, Chuang Gan, Joshua B. Tenenbaum

We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts.


Linking Emergent and Natural Languages via Corpus Transfer

1 code implementation ICLR 2022 Shunyu Yao, Mo Yu, Yang Zhang, Karthik R Narasimhan, Joshua B. Tenenbaum, Chuang Gan

In this work, we propose a novel way to establish such a link by corpus transfer, i. e. pretraining on a corpus of emergent language for downstream natural language tasks, which is in contrast to prior work that directly transfers speaker and listener parameters.

Disentanglement Image Captioning +1

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

no code implementations CVPR 2022 Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Li Yi

To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered.

3D Part Segmentation Domain Generalization

Finding Fallen Objects via Asynchronous Audio-Visual Integration

no code implementations CVPR 2022 Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh H. McDermott, Antonio Torralba

To study this problem, we have generated a large-scale dataset -- the Fallen Objects dataset -- that includes 8000 instances of 30 physical object categories in 64 rooms.

Imitation Learning Object Localization

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

no code implementations NeurIPS 2021 Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.

Instance Segmentation Semantic Segmentation +1

STAR: A Benchmark for Situated Reasoning in Real-World Videos

1 code implementation NeurIPS 2021 Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR).

Question Answering

Memory-efficient Patch-based Inference for Tiny Deep Learning

no code implementations NeurIPS 2021 Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +2

Graph Convolutional Module for Temporal Action Localization in Videos

no code implementations1 Dec 2021 Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms.

Ranked #2 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)

Action Recognition Computer Vision

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

no code implementations NeurIPS 2021 Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.

Visual Reasoning

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

1 code implementation28 Oct 2021 Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +2

Network Augmentation for Tiny Deep Learning

no code implementations ICLR 2022 Han Cai, Chuang Gan, Ji Lin, Song Han

We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.

Data Augmentation Image Classification +2

OPEn: An Open-ended Physics Environment for Learning Without a Task

no code implementations13 Oct 2021 Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola

We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.

Contrastive Learning Representation Learning

Inducing Reusable Skills From Demonstrations with Option-Controller Network

no code implementations29 Sep 2021 Siyuan Zhou, Yikang Shen, Yuchen Lu, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

With the isolation of information and the synchronous calling mechanism, we can impose a division of works between the controller and options in an end-to-end training regime.

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device

1 code implementation27 Sep 2021 Ji Lin, Chuang Gan, Kuan Wang, Song Han

Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8.

Video Recognition Video Understanding

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

1 code implementation2 Aug 2021 Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu

By fine-tuning the models on a number of commonly used remote sensing datasets, we show that our approach outperforms existing pre-training strategies for remote sensing imagery.

Representation Learning Transfer Learning

Certifiably Robust Interpretation via Renyi Differential Privacy

no code implementations4 Jul 2021 Ao Liu, Xiaoyu Chen, Sijia Liu, Lirong Xia, Chuang Gan

The advantages of our Renyi-Robust-Smooth (RDP-based interpretation method) are three-folds.

Global Rhythm Style Transfer Without Text Transcriptions

no code implementations16 Jun 2021 Kaizhi Qian, Yang Zhang, Shiyu Chang, JinJun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson

In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions.

Representation Learning Style Transfer

Temporal and Object Quantification Networks

no code implementations10 Jun 2021 Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman

We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.

Adversarial Option-Aware Hierarchical Imitation Learning

1 code implementation10 Jun 2021 Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei LI

In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent.

Imitation Learning

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

1 code implementation CVPR 2021 Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.

Question Answering Visual Question Answering

PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics

1 code implementation ICLR 2021 Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B. Tenenbaum, Chuang Gan

Experimental results suggest that 1) RL-based approaches struggle to solve most of the tasks efficiently; 2) gradient-based approaches, by optimizing open-loop control sequences with the built-in differentiable physics engine, can rapidly find a solution within tens of iterations, but still fall short on multi-stage tasks that require long-term planning.

Learning Task Decomposition with Ordered Memory Policy Network

no code implementations19 Mar 2021 Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration.

Inductive Bias

AGENT: A Benchmark for Core Psychological Reasoning

no code implementations24 Feb 2021 Tianmin Shu, Abhishek Bhandwaldar, Chuang Gan, Kevin A. Smith, Shari Liu, Dan Gutfreund, Elizabeth Spelke, Joshua B. Tenenbaum, Tomer D. Ullman

For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life.

Core Psychological Reasoning

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning

1 code implementation ICLR 2021 Ren Wang, Kaidi Xu, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Chuang Gan, Meng Wang

Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning.

Adversarial Attack Adversarial Robustness +3

Temporal and Object Quantification Nets

no code implementations1 Jan 2021 Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer Ullman

We aim to learn generalizable representations for complex activities by quantifying over both entities and time, as in “the kicker is behind all the other players,” or “the player controls the ball until it moves toward the goal.” Such a structural inductive bias of object relations, object quantification, and temporal orders will enable the learned representation to generalize to situations with varying numbers of agents, objects, and time courses.

Event Detection Inductive Bias

Object-Centric Diagnosis of Visual Reasoning

no code implementations21 Dec 2020 Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan

In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.

Question Answering Visual Question Answering +1

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

3 code implementations13 Dec 2020 Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding

Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.

Action Classification Action Recognition +1

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

1 code implementation27 Oct 2020 Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition.

Representation Learning Self-Supervised Action Recognition +1

Synthetic Training for Monocular Human Mesh Recovery

no code implementations27 Oct 2020 Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei

To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available.

Computer Vision Human Mesh Recovery

Location-aware Graph Convolutional Networks for Video Question Answering

1 code implementation7 Aug 2020 Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan

In this work, we propose to represent the contents in the video as a location-aware graph by incorporating the location information of an object into the graph construction.

Action Recognition graph construction +2

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events

no code implementations27 Jul 2020 Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum

Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.

Atari Games

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

1 code implementation NeurIPS 2020 Han Cai, Chuang Gan, Ligeng Zhu, Song Han

Furthermore, combined with feature extractor adaptation, TinyTL provides 7. 3-12. 9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

Transfer Learning

Foley Music: Learning to Generate Music from Videos

no code implementations ECCV 2020 Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.

Music Generation Translation

MCUNet: Tiny Deep Learning on IoT Devices

1 code implementation NeurIPS 2020 Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.

Neural Architecture Search

Generating Visually Aligned Sound from Videos

1 code implementation14 Jul 2020 Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan

During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features.

Language Guided Networks for Cross-modal Moment Retrieval

no code implementations18 Jun 2020 Kun Liu, Huadong Ma, Chuang Gan

In this paper, we present Language Guided Networks (LGN), a new framework that leverages the sentence embedding to guide the whole process of moment retrieval.

Moment Retrieval Sentence Embedding +1

A Real-time Action Representation with Temporal Encoding and Deep Compression

no code implementations17 Jun 2020 Kun Liu, Wu Liu, Huadong Ma, Mingkui Tan, Chuang Gan

Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5. 4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.

Action Recognition

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

4 code implementations ACL 2020 Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han

To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search.

Machine Translation Natural Language Processing +2

Once for All: Train One Network and Specialize it for Efficient Deployment

1 code implementation ICLR 2020 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable.

Neural Architecture Search

Deep Audio Priors Emerge From Harmonic Convolutional Networks

no code implementations ICLR 2020 Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.

Dense Regression Network for Video Grounding

1 code implementation CVPR 2020 Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan

The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy.

Natural Language Moment Retrieval Natural Language Queries +1

Visual Concept-Metaconcept Learning

1 code implementation NeurIPS 2019 Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu

Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i. e., the color).

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

1 code implementation25 Dec 2019 Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.

Cross-channel Communication Networks

1 code implementation NeurIPS 2019 Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers.

Computer Vision

Self-supervised Moving Vehicle Tracking with Stereo Sound

no code implementations ICCV 2019 Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.

Object Localization Visual Localization

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation

no code implementations14 Oct 2019 Fan Yang, Xiao Liu, Dongliang He, Chuang Gan, Jian Wang, Chao Li, Fu Li, Shilei Wen

In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story.

Highlight Detection Video Summarization

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

3 code implementations ICLR 2020 Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

Visual Reasoning

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

no code implementations1 Oct 2019 Ji Lin, Chuang Gan, Song Han

With such hardware-aware model design, we are able to scale up the training on Summit supercomputer and reduce the training time on Kinetics dataset from 49 hours 55 minutes to 14 minutes 13 seconds, achieving a top-1 accuracy of 74. 0%, which is 1. 6x and 2. 9x faster than previous 3D video models with higher accuracy.

Video Recognition

Graph Convolutional Networks for Temporal Action Localization

1 code implementation ICCV 2019 Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization.

Ranked #4 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)

Action Classification Temporal Action Localization

Once-for-All: Train One Network and Specialize it for Efficient Deployment

8 code implementations26 Aug 2019 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x faster than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission.

Computer Vision Neural Architecture Search

Deep Concept-wise Temporal Convolutional Networks for Action Localization

2 code implementations26 Aug 2019 Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, WangMeng Zuo, Chao Li, Xiang Long, Dongliang He, Fu Li, Shilei Wen

In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution.

Action Classification Action Localization

Self-Supervised Audio-Visual Co-Segmentation

no code implementations18 Apr 2019 Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.

Semantic Segmentation

Defensive Quantization: When Efficiency Meets Robustness

no code implementations ICLR 2019 Ji Lin, Chuang Gan, Song Han

This paper aims to raise people's awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.

Adversarial Attack Quantization

The Sound of Motions

no code implementations ICCV 2019 Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

Sounds originate from object motions and vibrations of surrounding air.

Interpreting Adversarial Examples by Activation Promotion and Suppression

no code implementations3 Apr 2019 Kaidi Xu, Sijia Liu, Gaoyuan Zhang, Mengshu Sun, Pu Zhao, Quanfu Fan, Chuang Gan, Xue Lin

It is widely known that convolutional neural networks (CNNs) are vulnerable to adversarial examples: images with imperceptible perturbations crafted to fool classifiers.

Adversarial Robustness

Weakly Supervised Dense Event Captioning in Videos

no code implementations NeurIPS 2018 Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang

Dense event captioning aims to detect and describe all events of interest contained in a video.

TSM: Temporal Shift Module for Efficient Video Understanding

12 code implementations ICCV 2019 Ji Lin, Chuang Gan, Song Han

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Action Classification Action Recognition +4

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

5 code implementations5 Nov 2018 Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

Action Recognition

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

1 code implementation NeurIPS 2018 Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.

Question Answering Representation Learning +1

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation

no code implementations9 Aug 2018 Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong

The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip.

Image-to-Image Translation Translation +1

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning

no code implementations CVPR 2018 Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas

In addition, we also find that a progressive training strategy can foster a better neural network for the video recognition task than blindly pooling the distinct sources of geometry cues together.

Action Recognition Representation Learning +4

The Sound of Pixels

2 code implementations ECCV 2018 Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

End-to-End Learning of Motion Representation for Video Understanding

1 code implementation CVPR 2018 Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang

Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks.

Action Recognition Optical Flow Estimation +1

Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

1 code implementation ECCV 2018 Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Luo, Qi-Xing Huang

In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image.

Keypoint Estimation Unsupervised Domain Adaptation

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

3 code implementations CVPR 2018 Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen

In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.

Classification General Classification +1

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

1 code implementation ICCV 2017 Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

Many seemingly distant annotations (e. g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e. g., of COCO).

Language Modelling Multiple-choice +3

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

no code implementations12 Aug 2017 Yunlong Bian, Chuang Gan, Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie zhou, Shilei Wen, Yuanqing Lin

Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks.

Action Classification General Classification +2

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

1 code implementation14 Jul 2017 Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie zhou, Shilei Wen

This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place.

Video Recognition Video Understanding

StyleNet: Generating Attractive Visual Captions With Styles

no code implementations CVPR 2017 Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng

We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles.

Recurrent Topic-Transition GAN for Visual Paragraph Generation

no code implementations ICCV 2017 Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators.

Image Paragraph Captioning

Video Captioning with Multi-Faceted Attention

no code implementations TACL 2018 Xiang Long, Chuang Gan, Gerard de Melo

Recently, video captioning has been attracting an increasing amount of interest, due to its potential for improving accessibility and information retrieval.

Information Retrieval Video Captioning

Semantic Compositional Networks for Visual Captioning

1 code implementation CVPR 2017 Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng

The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag.

Image Captioning Semantic Composition +1

Strategies for Searching Video Content with Text Queries or Video Examples

no code implementations17 Jun 2016 Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.

Event Detection Video Retrieval

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

no code implementations CVPR 2016 Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei

The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.

Action Recognition Event Detection

Learning Attributes Equals Multi-Source Domain Generalization

no code implementations CVPR 2016 Chuang Gan, Tianbao Yang, Boqing Gong

Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval.

Computer Vision Domain Generalization +2

DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting

no code implementations CVPR 2015 Chuang Gan, Naiyan Wang, Yi Yang, Dit-yan Yeung, Alex G. Hauptmann

Taking key frames of videos as input, we first detect the event of interest at the video level by aggregating the CNN features of the key frames.

Action Recognition Event Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.