Search Results for author: Song-Chun Zhu

Found 210 papers, 67 papers with code

GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning

no code implementations • Findings (ACL) 2021 • Zilong Zheng, Shuwen Qiu, Lifeng Fan, Yixin Zhu, Song-Chun Zhu

Paper
Add Code

CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization

no code implementations • EMNLP 2021 • Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut

One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role.

Answer Generation Question-Answer-Generation +2

Paper
Add Code

Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions

no code implementations • EMNLP 2021 • Arjun Akula, Spandana Gella, Keze Wang, Song-Chun Zhu, Siva Reddy

Our model outperforms the state-of-the-art NMN model on CLEVR-Ref+ dataset with +8. 1% improvement in accuracy on the single-referent test set and +4. 3% on the full test set.

Paper
Add Code

Towards Socially Intelligent Agents with Mental State Transition and Human Value

no code implementations • SIGDIAL (ACL) 2022 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu

One of which is to track the agent’s mental state transition and teach the agent to make decisions guided by its value like a human.

Paper
Add Code

SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

no code implementations • 3 May 2024 • Qian Long, Fangwei Zhong, Mingdong Wu, Yizhou Wang, Song-Chun Zhu

Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks.

Denoising Multi-agent Reinforcement Learning +1

Paper
Add Code

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

1 code implementation • 26 Apr 2024 • Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation.

Imitation Learning

Paper
Code

PhyRecon: Physically Plausible Neural Scene Reconstruction

no code implementations • 25 Apr 2024 • Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

In this paper, we introduce PhyRecon, which stands as the first approach to harness both differentiable rendering and differentiable physics simulation to learn implicit surface representations.

3D Reconstruction Multi-View 3D Reconstruction

Paper
Add Code

LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning

1 code implementation • 18 Mar 2024 • Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu

Through a series of simulations in a box-packing domain, we quantitatively demonstrate the effectiveness of LLM^3 in solving TAMP problems and the efficiency in selecting action parameters.

Language Modelling Large Language Model +2

Paper
Code

Fast Peer Adaptation with Context-aware Exploration

no code implementations • 4 Feb 2024 • Long Ma, Yuanfei Wang, Fangwei Zhong, Song-Chun Zhu, Yizhou Wang

To do so, it is crucial for the agent to efficiently probe and identify the peer's strategy, as this is the prerequisite for carrying out the best response in adaptation.

Paper
Add Code

On the Emergence of Symmetrical Reality

no code implementations • 26 Jan 2024 • Zhenliang Zhang, Zeyu Zhang, Ziyuan Jiao, Yao Su, Hangxin Liu, Wei Wang, Song-Chun Zhu

Artificial intelligence (AI) has revolutionized human cognitive abilities and facilitated the development of new AI entities capable of interacting with humans in both physical and virtual environments.

Mixed Reality

Paper
Add Code

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

1 code implementation • 19 Jan 2024 • Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu

Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning.

Decision Making

Paper
Code

CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update

no code implementations • 18 Dec 2023 • Zhi Gao, Yuntao Du, Xintong Zhang, Xiaojian Ma, Wenjuan Han, Song-Chun Zhu, Qing Li

However, these methods often overlook the potential for continual learning, typically by freezing the utilized tools, thus limiting their adaptation to environments requiring new knowledge.

Continual Learning Question Answering +1

Paper
Add Code

Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models

no code implementations • 9 Dec 2023 • Zhou Ziheng, YingNian Wu, Song-Chun Zhu, Demetri Terzopoulos

We introduce Aligner, a novel Parameter-Efficient Fine-Tuning (PEFT) method for aligning multi-billion-parameter-sized Large Language Models (LLMs).

Instruction Following

Paper
Add Code

An Embodied Generalist Agent in 3D World

1 code implementation • 18 Nov 2023 • Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

However, several significant challenges remain: (i) most of these models rely on 2D images yet exhibit a limited capacity for 3D input; (ii) these models rarely explore the tasks inherently defined in 3D world, e. g., 3D grounding, embodied reasoning and acting.

3D dense captioning Question Answering +3

228

Paper
Code

AI Alignment: A Comprehensive Survey

no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

Paper
Add Code

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

1 code implementation • 16 Oct 2023 • Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun Zhu, Yizhou Wang

We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems.

Ranked #1 on Visual Reasoning on Bongard-OpenWorld

Few-Shot Learning Logical Reasoning +1

Paper
Code

CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity

no code implementations • 23 Sep 2023 • Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, LiWei Wang, Zhouchen Lin, Song-Chun Zhu

With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.

Distributed Optimization

Paper
Add Code

MindAgent: Emergent Gaming Interaction

no code implementations • 18 Sep 2023 • Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao

Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration.

In-Context Learning Scheduling

Paper
Add Code

ProAgent: Building Proactive Cooperative Agents with Large Language Models

no code implementations • 22 Aug 2023 • Ceyao Zhang, Kaijie Yang, Siyi Hu, ZiHao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang

Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems.

Paper
Add Code

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

1 code implementation • ICCV 2023 • Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu

Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in infancy.

Paper
Code

Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models

no code implementations • 7 Jul 2023 • Yuxi Ma, Chi Zhang, Song-Chun Zhu

In this perspective paper, we first comprehensively review existing evaluations of Large Language Models (LLMs) using both standardized tests and ability-oriented benchmarks.

Unity

Paper
Add Code

MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation

no code implementations • 27 Jun 2023 • Shuwen Qiu, Song-Chun Zhu, Zilong Zheng

We design an explicit mind module that can track three-level beliefs -- the speaker's belief, the speaker's prediction of the listener's belief, and the common belief based on the gap between the first two.

Dialogue Generation Theory of Mind Modeling

Paper
Add Code

Heterogeneous Value Alignment Evaluation for Large Language Models

2 code implementations • 26 May 2023 • Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun Zhu, Shuguang Cui, Yaodong Yang

We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values.

Attribute

Paper
Code

Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners

1 code implementation • 24 May 2023 • Xiaojuan Tang, Zilong Zheng, Jiaqi Li, Fanxu Meng, Song-Chun Zhu, Yitao Liang, Muhan Zhang

On the whole, our analysis provides a novel perspective on the role of semantics in developing and evaluating language models' reasoning abilities.

Paper
Code

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

1 code implementation • NeurIPS 2023 • Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao

At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response.

Logical Reasoning

1,020

Paper
Code

ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

1 code implementation • ICCV 2023 • Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes.

Object

114

Paper
Code

Rearrange Indoor Scenes for Human-Robot Co-Activity

no code implementations • 10 Mar 2023 • Weiqi Wang, Zihang Zhao, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

We present an optimization-based framework for rearranging indoor furniture to accommodate human-robot co-activities better.

Paper
Add Code

Diffusion-based Generation, Optimization, and Planning in 3D Scenes

2 code implementations • CVPR 2023 • Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu

SceneDiffuser provides a unified model for solving scene-conditioned generation, optimization, and planning.

Denoising Grasp Generation +2

318

Paper
Code

A Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps

no code implementations • 14 Jan 2023 • Hangxin Liu, Zeyu Zhang, Ziyuan Jiao, Zhenliang Zhang, Minchen Li, Chenfanfu Jiang, Yixin Zhu, Song-Chun Zhu

In this work, we present a reconfigurable data glove design to capture different modes of human hand-object interactions, which are critical in training embodied artificial intelligence (AI) agents for fine manipulation tasks.

Paper
Add Code

On the Complexity of Bayesian Generalization

1 code implementation • 20 Nov 2022 • Yu-Zhe Shi, Manjie Xu, John E. Hopcroft, Kun He, Joshua B. Tenenbaum, Song-Chun Zhu, Ying Nian Wu, Wenjuan Han, Yixin Zhu

Specifically, at the $representational \ level$, we seek to answer how the complexity varies when a visual concept is mapped to the representation space.

Attribute

Paper
Code

Learning Probabilistic Models from Generator Latent Spaces with Hat EBM

1 code implementation • 29 Oct 2022 • Mitch Hill, Erik Nijkamp, Jonathan Mitchell, Bo Pang, Song-Chun Zhu

This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM).

Paper
Code

RulE: Neural-Symbolic Knowledge Graph Reasoning with Rule Embedding

1 code implementation • 24 Oct 2022 • Xiaojuan Tang, Song-Chun Zhu, Yitao Liang, Muhan Zhang

In this paper, we propose a novel and principled framework called \textbf{RulE} (stands for {Rul}e {E}mbedding) to effectively leverage logical rules to enhance KG reasoning.

Knowledge Graph Embedding Knowledge Graphs +1

Paper
Code

SQA3D: Situated Question Answering in 3D Scenes

1 code implementation • 14 Oct 2022 • Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, Siyuan Huang

We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D).

Ranked #1 on Referring Expression on SQA3D

Question Answering Referring Expression +1

102

Paper
Code

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

1 code implementation • 8 Oct 2022 • Baoxiong Jia, Ting Lei, Song-Chun Zhu, Siyuan Huang

The challenges of such capability lie in the difficulty of generating a detailed understanding of situated actions, their effects on object states (i. e., state changes), and their causal dependencies.

Action Localization counterfactual +4

Paper
Code

Neural-Symbolic Recursive Machine for Systematic Generalization

no code implementations • 4 Oct 2022 • Qing Li, Yixin Zhu, Yitao Liang, Ying Nian Wu, Song-Chun Zhu, Siyuan Huang

We evaluate NSR's efficacy across four challenging benchmarks designed to probe systematic generalization capabilities: SCAN for semantic parsing, PCFG for string manipulation, HINT for arithmetic reasoning, and a compositional machine translation task.

Arithmetic Reasoning Machine Translation +2

Paper
Add Code

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

2 code implementations • 29 Sep 2022 • Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan

However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.

Logical Reasoning Math +1

2,608

Paper
Code

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.

Ranked #5 on Science Question Answering on ScienceQA

Multimodal Deep Learning Multimodal Reasoning +5

549

Paper
Code

Sequential Manipulation Planning on Scene Graph

1 code implementation • 10 Jul 2022 • Ziyuan Jiao, Yida Niu, Zeyu Zhang, Song-Chun Zhu, Yixin Zhu, Hangxin Liu

We devise a 3D scene graph representation, contact graph+ (cg+), for efficient sequential task planning.

Stochastic Optimization valid

Paper
Code

Understanding Physical Effects for Effective Tool-use

no code implementations • 30 Jun 2022 • Zeyu Zhang, Ziyuan Jiao, Weiqi Wang, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

We present a robot learning and planning framework that produces an effective tool-use strategy with the least joint efforts, capable of handling objects different from training.

Motion Planning regression +1

Paper
Add Code

VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in Omniverse

no code implementations • 23 Jun 2022 • Yizhou Zhao, Steven Gong, Xiaofeng Gao, Wensi Ai, Song-Chun Zhu

With the recent progress of simulations by 3D modeling software and game engines, many researchers have focused on Embodied AI tasks in the virtual environment.

Benchmarking Indoor Scene Synthesis

Paper
Add Code

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

1 code implementation • 17 Jun 2022 • Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Yiran Geng, Hao Dong, Zongqing Lu, Song-Chun Zhu, Yaodong Yang

In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects.

Few-Shot Learning Offline RL +2

529

Paper
Code

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

2 code implementations • 13 Jun 2022 • Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, Ying Nian Wu

Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling.

Paper
Code

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

1 code implementation • CVPR 2022 • Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts.

Ranked #1 on Few-Shot Image Classification on Bongard-HOI

Benchmarking Few-Shot Image Classification +5

Paper
Code

EBM Life Cycle: MCMC Strategies for Synthesis, Defense, and Density Modeling

1 code implementation • 24 May 2022 • Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu

This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories.

Adversarial Defense Image Generation +1

Paper
Code

RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

1 code implementation • ICLR 2022 • Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.

Ranked #1 on Zero-Shot Human-Object Interaction Detection on HICO

Human-Object Interaction Detection Object +5

Paper
Code

Triangular Character Animation Sampling with Motion, Emotion, and Relation

no code implementations • 9 Mar 2022 • Yizhou Zhao, Liang Qiu, Wensi Ai, Pan Lu, Song-Chun Zhu

We propose a Spatial-Temporal And-Or graph (ST-AOG), a stochastic grammar model, to encode the contextual relationship between motion, emotion, and relation, forming a triangle in a conditional random field.

Relation

Paper
Add Code

PartAfford: Part-level Affordance Discovery from 3D Objects

no code implementations • 28 Feb 2022 • Chao Xu, Yixin Chen, He Wang, Song-Chun Zhu, Yixin Zhu, Siyuan Huang

We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization, without dense supervision.

Object

Paper
Add Code

Attention cannot be an Explanation

no code implementations • 26 Jan 2022 • Arjun R Akula, Song-Chun Zhu

Motivated by this, we ask a follow-up question: "Assuming that we only consider the tasks where attention weights correlate well with feature importance, how effective are these attention based explanations in increasing human trust and reliance in the underlying models?".

Feature Importance

Paper
Add Code

Discourse Analysis for Evaluating Coherence in Video Paragraph Captions

no code implementations • 17 Jan 2022 • Arjun R Akula, Song-Chun Zhu

We also introduce DisNet, a novel dataset containing the proposed visual discourse annotations of 3000 videos and their paragraphs.

Video Captioning Visual Dialog +1

Paper
Add Code

Learning from the Tangram to Solve Mini Visual Tasks

1 code implementation • 12 Dec 2021 • Yizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, Song-Chun Zhu

Current pre-training methods in computer vision focus on natural images in the daily-life context.

Few-Shot Learning

Paper
Code

ValueNet: A New Dataset for Human Value Driven Dialogue System

no code implementations • 12 Dec 2021 • Liang Qiu, Yizhou Zhao, Jinchao Li, Pan Lu, Baolin Peng, Jianfeng Gao, Song-Chun Zhu

To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems.

Dialogue Generation Emotion Recognition +2

Paper
Add Code

Robust Visual Reasoning via Language Guided Neural Module Networks

no code implementations • NeurIPS 2021 • Arjun Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu

Neural module networks (NMN) are a popular approach for solving multi-modal tasks such as visual question answering (VQA) and visual referring expression recognition (REF).

Question Answering Referring Expression +2

Paper
Add Code

Emergent Graphical Conventions in a Visual Communication Game

no code implementations • 28 Nov 2021 • Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Jungseock Joo, Song-Chun Zhu, Yixin Zhu

Humans communicate with graphical sketches apart from symbolic languages.

Paper
Add Code

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

no code implementations • 25 Nov 2021 • Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu

Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization.

Abstract Algebra Systematic Generalization

Paper
Add Code

Unsupervised Foreground Extraction via Deep Region Competition

2 code implementations • NeurIPS 2021 • Peiyu Yu, Sirui Xie, Xiaojian Ma, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background.

Image Segmentation Inductive Bias +1

Paper
Code

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

1 code implementation • 25 Oct 2021 • Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu

Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.

Ranked #1 on Visual Question Answering (VQA) on IconQA

Arithmetic Reasoning Math Word Problem Solving +2

Paper
Code

Iterative Teacher-Aware Learning

1 code implementation • NeurIPS 2021 • Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works.

Paper
Code

Emergence of Theory of Mind Collaboration in Multiagent Systems

no code implementations • 30 Sep 2021 • Luyao Yuan, Zipeng Fu, Linqi Zhou, Kexin Yang, Song-Chun Zhu

Currently, in the study of multiagent systems, the intentions of agents are usually ignored.

Decision Making

Paper
Add Code

MCMC Should Mix: Learning Energy-Based Model with Flow-Based Backbone

no code implementations • ICLR 2022 • Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu

However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by deep network, is highly multi-modal in the data space.

Paper
Add Code

YouRefIt: Embodied Reference Understanding with Language and Gesture

no code implementations • ICCV 2021 • Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang

To the best of our knowledge, this is the first embodied reference dataset that allows us to study referring expressions in daily physical scenes to understand referential behavior, human communication, and human-robot interaction.

Paper
Add Code

CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models

1 code implementation • 3 Sep 2021 • Arjun R. Akula, Keze Wang, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Chai, Song-Chun Zhu

More concretely, our CX-ToM framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user.

counterfactual Explainable Artificial Intelligence (XAI)

Paper
Code

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

1 code implementation • ICCV 2021 • Siyuan Huang, Yichen Xie, Song-Chun Zhu, Yixin Zhu

To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, occlusions, etc.

Ranked #4 on 3D Object Detection on SUN-RGBD

3D Object Detection 3D Point Cloud Classification +8

Paper
Code

Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis

1 code implementation • 15 Jul 2021 • Feng Shi, Chonghan Lee, Mohammad Khairul Bashar, Nikhil Shukla, Song-Chun Zhu, Vijaykrishnan Narayanan

Our model has a scale-free structure which could process varying size of instances.

BIG-bench Machine Learning Computational Efficiency

Paper
Code

STAR: Sparse Transformer-based Action Recognition

1 code implementation • 15 Jul 2021 • Feng Shi, Chonghan Lee, Liang Qiu, Yizhou Zhao, Tianyi Shen, Shivran Muralidhar, Tian Han, Song-Chun Zhu, Vijaykrishnan Narayanan

The cognitive system for human action and behavior has evolved into a deep learning regime, and especially the advent of Graph Convolution Networks has transformed the field in recent years.

Action Recognition Temporal Action Localization

Paper
Code

SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues

no code implementations • ACL 2021 • Liang Qiu, Yuan Liang, Yizhou Zhao, Pan Lu, Baolin Peng, Zhou Yu, Ying Nian Wu, Song-Chun Zhu

Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly.

Ranked #5 on Dialog Relation Extraction on DialogRE

Dialog Relation Extraction Relation

Paper
Add Code

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

1 code implementation • ACL 2021 • Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu

We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS).

Ranked #1 on Mathematical Question Answering on GeoS

Arithmetic Reasoning Geometry Problem Solving +5

113

Paper
Code

VersaGNN: a Versatile accelerator for Graph neural networks

no code implementations • 4 May 2021 • Feng Shi, Ahren Yiqiao Jin, Song-Chun Zhu

As GNNs operate on non-Euclidean data, their irregular data access patterns cause considerable computational costs and overhead on conventional architectures, such as GPU and CPU.

Graph Generation Graph Matching +1

Paper
Add Code

Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

1 code implementation • CVPR 2021 • Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu

By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents.

Scene Understanding

Paper
Code

Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis

1 code implementation • CVPR 2021 • Yaxuan Zhu, Ruiqi Gao, Siyuan Huang, Song-Chun Zhu, Ying Nian Wu

Specifically, the camera pose and 3D scene are represented as vectors and the local camera movement is represented as a matrix operating on the vector of the camera pose.

Decoder Novel View Synthesis +1

Paper
Code

Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments

1 code implementation • 30 Mar 2021 • Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

In this paper, we rethink the problem of scene reconstruction from an embodied agent's perspective: While the classic view focuses on the reconstruction accuracy, our new perspective emphasizes the underlying functions and constraints such that the reconstructed scenes provide \em{actionable} information for simulating \em{interactions} with agents.

Common Sense Reasoning

121

Paper
Code

Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance

1 code implementation • 26 Mar 2021 • Xu Xie, Chi Zhang, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Predicting agents' future trajectories plays a crucial role in modern AI systems, yet it is challenging due to intricate interactions exhibited in multi-agent systems, especially when it comes to collision avoidance.

Collision Avoidance Trajectory Prediction

Paper
Code

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

no code implementations • CVPR 2021 • Chi Zhang, Baoxiong Jia, Song-Chun Zhu, Yixin Zhu

To fill in this gap, we propose a neuro-symbolic Probabilistic Abduction and Execution (PrAE) learner; central to the PrAE learner is the process of probabilistic abduction and execution on a probabilistic scene representation, akin to the mental manipulation of objects.

Attribute Logical Reasoning

Paper
Add Code

ACRE: Abstract Causal REasoning Beyond Covariation

no code implementations • CVPR 2021 • Chi Zhang, Baoxiong Jia, Mark Edmonds, Song-Chun Zhu, Yixin Zhu

Causal induction, i. e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data.

Blocking Causal Discovery +1

Paper
Add Code

VLGrammar: Grounded Grammar Induction of Vision and Language

1 code implementation • ICCV 2021 • Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang

In this work, we study grounded grammar induction of vision and language in a joint learning framework.

Clustering Contrastive Learning +3

Paper
Code

Towards Socially Intelligent Agents with Mental State Transition and Human Utility

no code implementations • 12 Mar 2021 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu

One of which is to track the agent's mental state transition and teach the agent to make decisions guided by its value like a human.

Paper
Add Code

Learning Cycle-Consistent Cooperative Networks via Alternating MCMC Teaching for Unsupervised Cross-Domain Translation

no code implementations • 7 Mar 2021 • Jianwen Xie, Zilong Zheng, Xiaolin Fang, Song-Chun Zhu, Ying Nian Wu

This paper studies the unsupervised cross-domain translation problem by proposing a generative framework, in which the probability distribution of each domain is represented by a generative cooperative network that consists of an energy-based model and a latent variable model.

Decoder Translation +1

Paper
Add Code

Show Me What You Can Do: Capability Calibration on Reachable Workspace for Human-Robot Collaboration

no code implementations • 6 Mar 2021 • Xiaofeng Gao, Luyao Yuan, Tianmin Shu, Hongjing Lu, Song-Chun Zhu

Our experiments with human participants demonstrate that a short calibration using REMP can effectively bridge the gap between what a non-expert user thinks a robot can reach and the ground truth.

Motion Planning

Paper
Add Code

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics

no code implementations • 2 Mar 2021 • Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

We believe the HINT dataset and the experimental findings are of great interest to the learning community on systematic generalization.

Few-Shot Learning Program Synthesis +1

Paper
Add Code

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

no code implementations • 22 Feb 2021 • Sirui Xie, Xiaojian Ma, Peiyu Yu, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Leveraging these concepts, they could understand the internal structure of this task, without seeing all of the problem instances.

Paper
Add Code

Transformers satisfy

no code implementations • 1 Jan 2021 • Feng Shi, Chen Li, Shijie Bian, Yiqiao Jin, Ziheng Xu, Tian Han, Song-Chun Zhu

The Propositional Satisfiability Problem (SAT), and more generally, the Constraint Satisfaction Problem (CSP), are mathematical questions defined as finding an assignment to a set of objects that satisfies a series of constraints.

Paper
Add Code

Learning Algebraic Representation for Abstract Spatial-Temporal Reasoning

no code implementations • 1 Jan 2021 • Chi Zhang, Sirui Xie, Baoxiong Jia, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

We further show that the algebraic representation learned can be decoded by isomorphism and used to generate an answer.

Abstract Algebra Systematic Generalization

Paper
Add Code

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar

no code implementations • 27 Dec 2020 • Yining Hong, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, Song-Chun Zhu

Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability.

Math Mathematical Reasoning

Paper
Add Code

Generative VoxelNet: Learning Energy-Based Models for 3D Shape Synthesis and Analysis

no code implementations • 25 Dec 2020 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu

3D data that contains rich geometry information of objects and scenes is valuable for understanding 3D physical world.

3D Object Classification Super-Resolution

Paper
Add Code

Learning by Fixing: Solving Math Word Problems with Weak Supervision

1 code implementation • 19 Dec 2020 • Yining Hong, Qing Li, Daniel Ciao, Siyuan Huang, Song-Chun Zhu

To generate more diverse solutions, \textit{tree regularization} is applied to guide the efficient shrinkage and exploration of the solution space, and a \textit{memory buffer} is designed to track and save the discovered various fixes for each problem.

Ranked #1 on Math Word Problem Solving on Math23K (weakly-supervised metric)

Math Weakly-supervised Learning

Paper
Code

Weighted Entropy Modification for Soft Actor-Critic

no code implementations • 18 Nov 2020 • Yizhou Zhao, Song-Chun Zhu

We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights, which can be connected with prior knowledge, experience replay, and evolution process of the policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation

no code implementations • 12 Nov 2020 • Sirui Xie, Feng Gao, Song-Chun Zhu

Seeing that the proposed generalization problem has not been widely studied yet, we carefully define an evaluation protocol, with which we illustrate the effectiveness of MEIP on two proof-of-concept domains and one challenging task: learning to fold from demonstrations.

Paper
Add Code

A Representational Model of Grid Cells' Path Integration Based on Matrix Lie Algebras

no code implementations • 28 Sep 2020 • Ruiqi Gao, Jianwen Xie, Xue-Xin Wei, Song-Chun Zhu, Ying Nian Wu

The grid cells in the mammalian medial entorhinal cortex exhibit striking hexagon firing patterns when the agent navigates in the open field.

Position

Paper
Add Code

Structured Attention for Unsupervised Dialogue Structure Induction

1 code implementation • EMNLP 2020 • Liang Qiu, Yizhou Zhao, Weiyan Shi, Yuan Liang, Feng Shi, Tao Yuan, Zhou Yu, Song-Chun Zhu

Inducing a meaningful structural representation from one or a set of dialogues is a crucial but challenging task in computational linguistics.

Inductive Bias Sentence +1

Paper
Code

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

1 code implementation • ECCV 2020 • Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence.

Action Recognition Action Understanding +3

Paper
Code

Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks

no code implementations • 24 Jul 2020 • Xiaofeng Gao, Ran Gong, Yizhou Zhao, Shu Wang, Tianmin Shu, Song-Chun Zhu

Thus, in this paper, we propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations, where the robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications based on its online Bayesian inference of the user's mental state.

Bayesian Inference Explainable Artificial Intelligence (XAI) +1

Paper
Add Code

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

no code implementations • ECCV 2020 • Qing Li, Siyuan Huang, Yining Hong, Song-Chun Zhu

Humans can progressively learn visual concepts from easy to hard questions.

Question Answering

Paper
Add Code

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling

1 code implementation • NeurIPS 2021 • Ruiqi Gao, Jianwen Xie, Xue-Xin Wei, Song-Chun Zhu, Ying Nian Wu

In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector.

Dimensionality Reduction Position

Paper
Code

Learning Latent Space Energy-Based Prior Model

1 code implementation • NeurIPS 2020 • Bo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu

Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well.

Anomaly Detection Text Generation

Paper
Code

MCMC Should Mix: Learning Energy-Based Model with Neural Transport Latent Space MCMC

no code implementations • 12 Jun 2020 • Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu

Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm.

Paper
Add Code

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

1 code implementation • ICML 2020 • Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu

In this paper, we address these issues and close the loop of neural-symbolic learning by (1) introducing the \textbf{grammar} model as a \textit{symbolic prior} to bridge neural perception and symbolic reasoning, and (2) proposing a novel \textbf{back-search} algorithm which mimics the top-down human-like learning procedure to propagate the error through the symbolic reasoning module efficiently.

Question Answering Reinforcement Learning (RL) +1

Paper
Code

Joint Training of Variational Auto-Encoder and Latent Energy-Based Model

no code implementations • CVPR 2020 • Tian Han, Erik Nijkamp, Linqi Zhou, Bo Pang, Song-Chun Zhu, Ying Nian Wu

This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM).

Anomaly Detection

Paper
Add Code

Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models

1 code implementation • ICLR 2021 • Mitch Hill, Jonathan Mitchell, Song-Chun Zhu

Our contributions are 1) an improved method for training EBM's with realistic long-run MCMC samples, 2) an Expectation-Over-Transformation (EOT) defense that resolves theoretical ambiguities for stochastic defenses and from which the EOT attack naturally follows, and 3) state-of-the-art adversarial defense for naturally-trained classifiers and competitive defense compared to adversarially-trained classifiers on Cifar-10, SVHN, and Cifar-100.

Adversarial Defense Robust classification

Paper
Code

Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions

1 code implementation • ACL 2020 • Arjun R. Akula, Spandana Gella, Yaser Al-Onaizan, Song-Chun Zhu, Siva Reddy

To measure the true progress of existing models, we split the test set into two sets, one which requires reasoning on linguistic structure and the other which doesn't.

Contrastive Learning Multi-Task Learning +2

Paper
Code

Congestion-aware Evacuation Routing using Augmented Reality Devices

no code implementations • 25 Apr 2020 • Zeyu Zhang, Hangxin Liu, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu

We present a congestion-aware routing solution for indoor evacuation, which produces real-time individual-customized evacuation routes among multiple destinations while keeping tracks of all evacuees' locations.

Paper
Add Code

Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

2 code implementations • 25 Apr 2020 • Wenhe Zhang, Chi Zhang, Yixin Zhu, Song-Chun Zhu

To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG).

Relational Reasoning Visual Reasoning

Paper
Code

Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

no code implementations • 25 Apr 2020 • Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu

Aiming to understand how human (false-)belief--a core socio-cognitive ability--would affect human interactions with robots, this paper proposes to adopt a graphical model to unify the representation of object states, robot knowledge, and human (false-)beliefs.

Object Object Tracking

Paper
Add Code

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

no code implementations • 20 Apr 2020 • Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Joshua B. Tenenbaum, Song-Chun Zhu

We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning.

Common Sense Reasoning Small Data Image Classification

Paper
Add Code

Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification

1 code implementation • CVPR 2021 • Jianwen Xie, Yifei Xu, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu

We propose a generative model of unordered point sets, such as point clouds, in the form of an energy-based model, where the energy function is parameterized by an input-permutation-invariant bottom-up neural network.

3D Generation General Classification +3

Paper
Code

Emergence of Pragmatics from Referential Game between Theory of Mind Agents

1 code implementation • 21 Jan 2020 • Luyao Yuan, Zipeng Fu, Jingyue Shen, Lu Xu, Junhong Shen, Song-Chun Zhu

Pragmatics studies how context can contribute to language meanings.

Reinforcement Learning (RL)

Paper
Code

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

no code implementations • NeurIPS 2019 • Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate.

Ranked #2 on Monocular 3D Object Detection on SUN RGB-D (AP@0.15 (10 / PNet-30) metric)

Monocular 3D Object Detection Object +1

Paper
Add Code

Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference

no code implementations • ECCV 2020 • Erik Nijkamp, Bo Pang, Tian Han, Linqi Zhou, Song-Chun Zhu, Ying Nian Wu

Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables.

Paper
Add Code

Learning Perceptual Inference by Contrasting

1 code implementation • NeurIPS 2019 • Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

"Thinking in pictures," [1] i. e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development.

Paper
Code

Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns

no code implementations • 26 Nov 2019 • Jianwen Xie, Ruiqi Gao, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu

To model the motions explicitly, it is natural for the model to be based on the motions or the displacement fields of the pixels.

Disentanglement

Paper
Add Code

Representation Learning: A Statistical Perspective

no code implementations • 26 Nov 2019 • Jianwen Xie, Ruiqi Gao, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu

Learning representations of data is an important problem in statistics and machine learning.

BIG-bench Machine Learning Representation Learning

Paper
Add Code

Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning

no code implementations • 25 Nov 2019 • Mark Edmonds, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

Given these general theories, the goal is to train an agent by interactively exploring the problem space to (i) discover, form, and transfer useful abstract and structural knowledge, and (ii) induce useful knowledge from the instance-level attributes observed in the environment.

Reinforcement Learning (RL) Transfer Learning

Paper
Add Code

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

no code implementations • ICCV 2019 • Yuanlu Xu, Song-Chun Zhu, Tony Tung

We present DenseRaC, a novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image.

Ranked #79 on 3D Human Pose Estimation on MPI-INF-3DHP (using extra training data)

3D Human Pose Estimation

Paper
Add Code

Learning Energy-based Spatial-Temporal Generative ConvNets for Dynamic Patterns

no code implementations • 26 Sep 2019 • Jianwen Xie, Song-Chun Zhu, Ying Nian Wu

We show that an energy-based spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns.

Paper
Add Code

TWIN GRAPH CONVOLUTIONAL NETWORKS: GCN WITH DUAL GRAPH SUPPORT FOR SEMI-SUPERVISED LEARNING

no code implementations • 25 Sep 2019 • Feng Shi, Yizhou Zhao, Ziheng Xu, Tianyang Liu, Song-Chun Zhu

Graph Neural Networks as a combination of Graph Signal Processing and Deep Convolutional Networks shows great power in pattern recognition in non-Euclidean domains.

Paper
Add Code

X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust

no code implementations • 15 Sep 2019 • Arjun R. Akula, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu

We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations.

Action Recognition Explainable Artificial Intelligence (XAI) +2

Paper
Add Code

Inducing Hierarchical Compositional Model by Sparsifying Generator Network

no code implementations • CVPR 2020 • Xianglei Xing, Tianfu Wu, Song-Chun Zhu, Ying Nian Wu

To realize this AND-OR hierarchy in image synthesis, we learn a generator network that consists of the following two components: (i) Each layer of the hierarchy is represented by an over-complete set of convolutional basis functions.

Image Generation Image Reconstruction

Paper
Add Code

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

no code implementations • ICCV 2019 • Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction---3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.

3D Human Pose Estimation Human-Object Interaction Detection +1

Paper
Add Code

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

1 code implementation • ICCV 2019 • Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu

This paper addresses a new problem of understanding human gaze communication in social videos from both atomic-level and event-level, which is significant for studying human social interactions.

Decoder

Paper
Code

HUGE2: a Highly Untangled Generative-model Engine for Edge-computing

no code implementations • 25 Jul 2019 • Feng Shi, Ziheng Xu, Tao Yuan, Song-Chun Zhu

In this work, we propose a Highly Untangled Generative-model Engine for Edge-computing or HUGE2 for accelerating these two special convolutions on the edge-computing platform by decomposing the kernels and untangling these smaller convolutions by performing basic matrix multiplications.

Edge-computing Semantic Segmentation

Paper
Add Code

Learning Pose Grammar for Monocular 3D Pose Estimation

no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2019 • Yuanlu Xu, Wenguan Wang, Xiaobai Liu, Jianwen Xie, Song-Chun Zhu

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation from a monocular RGB image.

Ranked #13 on 3D Human Pose Estimation on HumanEva-I

3D Human Pose Estimation 3D Pose Estimation +1

Paper
Add Code

Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model

no code implementations • NeurIPS 2019 • Erik Nijkamp, Mitch Hill, Song-Chun Zhu, Ying Nian Wu

We treat this non-convergent short-run MCMC as a learned generator model or a flow model.

valid

Paper
Add Code

Reasoning Visual Dialogs with Structural and Partial Observations

1 code implementation • CVPR 2019 • Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu

The answer to a given question is represented by a node with missing value.

Ranked #14 on Visual Dialog on VisDial v0.9 val

Visual Dialog

Paper
Code

On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models

2 code implementations • 29 Mar 2019 • Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, Ying Nian Wu

On the other hand, ConvNet potentials learned with non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from observed images.

Anatomy

Paper
Code

VRKitchen: an Interactive 3D Virtual Environment for Task-oriented Learning

1 code implementation • 13 Mar 2019 • Xiaofeng Gao, Ran Gong, Tianmin Shu, Xu Xie, Shu Wang, Song-Chun Zhu

One of the main challenges of advancing task-oriented learning such as visual task planning and reinforcement learning is the lack of realistic and standardized environments for training and testing AI agents.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Natural Language Interaction with Explainable AI Models

no code implementations • 13 Mar 2019 • Arjun R. Akula, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu

This paper presents an explainable AI (XAI) system that provides explanations for its predictions.

Explainable Artificial Intelligence (XAI)

Paper
Add Code

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

no code implementations • CVPR 2019 • Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu

In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation.

Object Recognition Question Answering +2

Paper
Add Code

Discourse Parsing in Videos: A Multi-modal Appraoch

1 code implementation • 6 Mar 2019 • Arjun R. Akula, Song-Chun Zhu

We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video.

Discourse Parsing Visual Dialog +1

Paper
Code

Cooperative Training of Fast Thinking Initializer and Slow Thinking Solver for Conditional Learning

no code implementations • 7 Feb 2019 • Jianwen Xie, Zilong Zheng, Xiaolin Fang, Song-Chun Zhu, Ying Nian Wu

This paper studies the problem of learning the conditional distribution of a high-dimensional output given an input, where the output and input may belong to two different domains, e. g., the output is a photo image and the input is a sketch image.

Image-to-Image Translation

Paper
Add Code

Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion

no code implementations • 24 Jan 2019 • Ruiqi Gao, Jianwen Xie, Siyuan Huang, Yufan Ren, Song-Chun Zhu, Ying Nian Wu

This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1).

Optical Flow Estimation

Paper
Add Code

Inducing Sparse Coding and And-Or Grammar from Generator Network

no code implementations • 20 Jan 2019 • Xianglei Xing, Song-Chun Zhu, Ying Nian Wu

We introduce an explainable generative model by applying sparse operation on the feature maps of the generator network.

Paper
Add Code

Explaining AlphaGo: Interpreting Contextual Effects in Neural Networks

no code implementations • 8 Jan 2019 • Zenan Ling, Haotian Ma, Yu Yang, Robert C. Qiu, Song-Chun Zhu, Quanshi Zhang

In this paper, we propose to disentangle and interpret contextual effects that are encoded in a pre-trained deep neural network.

Paper
Add Code

Interpretable CNNs for Object Classification

no code implementations • 8 Jan 2019 • Quanshi Zhang, Xin Wang, Ying Nian Wu, Huilin Zhou, Song-Chun Zhu

This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part.

Classification General Classification +1

Paper
Add Code

Divergence Triangle for Joint Training of Generator Model, Energy-based Model, and Inference Model

1 code implementation • 28 Dec 2018 • Tian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, Ying Nian Wu

This paper proposes the divergence triangle as a framework for joint training of generator model, energy-based model and inference model.

Paper
Code

Learning Dynamic Generator Model by Alternating Back-Propagation Through Time

no code implementations • 27 Dec 2018 • Jianwen Xie, Ruiqi Gao, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu

The non-linear transformation of this transition model can be parametrized by a feedforward neural network.

Paper
Add Code

Mining Interpretable AOG Representations from Convolutional Networks via Active Question Answering

no code implementations • 18 Dec 2018 • Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu

The AOG associates each object part with certain neural units in feature maps of conv-layers.

Object Question Answering

Paper
Add Code

Explanatory Graphs for CNNs

no code implementations • 18 Dec 2018 • Quanshi Zhang, Xin Wang, Ruiming Cao, Ying Nian Wu, Feng Shi, Song-Chun Zhu

This paper introduces a graphical model, namely an explanatory graph, which reveals the knowledge hierarchy hidden inside conv-layers of a pre-trained CNN.

Object

Paper
Add Code

MetaStyle: Three-Way Trade-Off Among Speed, Flexibility, and Quality in Neural Style Transfer

no code implementations • 13 Dec 2018 • Chi Zhang, Yixin Zhu, Song-Chun Zhu

An unprecedented booming has been witnessed in the research area of artistic style transfer ever since Gatys et al. introduced the neural method.

Bilevel Optimization Style Transfer

Paper
Add Code

Deeper Interpretability of Deep Networks

no code implementations • 19 Nov 2018 • Tian Xu, Jiayu Zhan, Oliver G. B. Garrod, Philip H. S. Torr, Song-Chun Zhu, Robin A. A. Ince, Philippe G. Schyns

However, understanding the information represented and processed in CNNs remains in most cases challenging.

Paper
Add Code

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

1 code implementation • NeurIPS 2018 • Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D.

Ranked #5 on Monocular 3D Object Detection on SUN RGB-D

Monocular 3D Object Detection Object +4

100

Paper
Code

Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion

1 code implementation • ICLR 2019 • Ruiqi Gao, Jianwen Xie, Song-Chun Zhu, Ying Nian Wu

In this model, the 2D self-position of the agent is represented by a high-dimensional vector, and the 2D self-motion or displacement of the agent is represented by a matrix that transforms the vector.

Position

Paper
Code

A Tale of Three Probabilistic Families: Discriminative, Descriptive and Generative Models

no code implementations • 9 Oct 2018 • Ying Nian Wu, Ruiqi Gao, Tian Han, Song-Chun Zhu

In this paper, we review three families of probability models, namely, the discriminative models, the descriptive models, and the generative models.

Descriptive

Paper
Add Code

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

no code implementations • 3 Oct 2018 • Feng Shi, Haochen Li, Yuhe Gao, Benjamin Kuschner, Song-Chun Zhu

The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators.

Layout Design

Paper
Add Code

Interactive Agent Modeling by Learning to Probe

no code implementations • 1 Oct 2018 • Tianmin Shu, Caiming Xiong, Ying Nian Wu, Song-Chun Zhu

In particular, the probing agent (i. e. a learner) learns to interact with the environment and with a target agent (i. e., a demonstrator) to maximize the change in the observed behaviors of that agent.

Imitation Learning

Paper
Add Code

Human-centric Indoor Scene Synthesis Using Stochastic Grammar

1 code implementation • CVPR 2018 • Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, Song-Chun Zhu

We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with perfect per-pixel ground truth.

Indoor Scene Synthesis

Paper
Code

Learning Human-Object Interactions by Graph Parsing Neural Networks

1 code implementation • ECCV 2018 • Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.

Ranked #32 on Human-Object Interaction Detection on V-COCO

Human-Object Interaction Detection Object

225

Paper
Code

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

1 code implementation • ECCV 2018 • Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu

We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.

Ranked #4 on Monocular 3D Object Detection on SUN RGB-D (AP@0.15 (10 / PNet-30) metric)

Monocular 3D Object Detection Object +5

214

Paper
Code

Deformable Generator Networks: Unsupervised Disentanglement of Appearance and Geometry

2 code implementations • 16 Jun 2018 • Xianglei Xing, Ruiqi Gao, Tian Han, Song-Chun Zhu, Ying Nian Wu

We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner.

Disentanglement Transfer Learning

Paper
Code

Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction

no code implementations • ICML 2018 • Siyuan Qi, Baoxiong Jia, Song-Chun Zhu

Future predictions on sequence data (e. g., videos or audios) require the algorithms to capture non-Markovian and compositional properties of high-level semantics.

Activity Prediction Future prediction

Paper
Add Code

Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks

no code implementations • CVPR 2018 • Ping Wei, Yang Liu, Tianmin Shu, Nanning Zheng, Song-Chun Zhu

We built a new video dataset of tasks, intentions, and attention.

Paper
Add Code

Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification

1 code implementation • CVPR 2018 • Wenguan Wang, Yuanlu Xu, Jianbing Shen, Song-Chun Zhu

This paper proposes a knowledge-guided fashion network to solve the problem of visual fashion analysis, e. g., fashion landmark localization and clothing category classification.

General Classification

Paper
Code

Inferring Shared Attention in Social Scene Videos

no code implementations • CVPR 2018 • Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, Song-Chun Zhu

We collect a new dataset VideoCoAtt from public TV show videos, containing 380 complex video sequences with more than 492, 000 frames that include diverse social scenes for shared attention study.

Scene Understanding

Paper
Add Code

Unsupervised Learning of Neural Networks to Explain Neural Networks

no code implementations • 18 May 2018 • Quanshi Zhang, Yu Yang, Yuchen Liu, Ying Nian Wu, Song-Chun Zhu

Given feature maps of a certain conv-layer of the CNN, the explainer performs like an auto-encoder, which first disentangles the feature maps into object-part features and then inverts object-part features back to features of higher conv-layers of the CNN.

Disentanglement Object

Paper
Add Code

Learning Descriptor Networks for 3D Shape Synthesis and Analysis

1 code implementation • CVPR 2018 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu

This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns.

Object

Paper
Code

Intent-aware Multi-agent Reinforcement Learning

no code implementations • 6 Mar 2018 • Siyuan Qi, Song-Chun Zhu

We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Building a Telescope to Look Into High-Dimensional Image Spaces

no code implementations • 2 Mar 2018 • Mitch Hill, Erik Nijkamp, Song-Chun Zhu

However, characterizing a learned probability density to uncover the Hopfield memories of the model, encoded by the structure of the local modes, remains an open challenge.

Vocal Bursts Intensity Prediction

Paper
Add Code

Visual Interpretability for Deep Learning: a Survey

1 code implementation • 2 Feb 2018 • Quanshi Zhang, Song-Chun Zhu

This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations.

Explainable artificial intelligence

Paper
Code

Examining CNN Representations with respect to Dataset Bias

no code implementations • 29 Oct 2017 • Quanshi Zhang, Wenguan Wang, Song-Chun Zhu

We aim to discover representation flaws caused by potential dataset bias.

Attribute

Paper
Add Code

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

no code implementations • 17 Oct 2017 • Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, Song-Chun Zhu

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation.

Ranked #1 on 3D Absolute Human Pose Estimation on Human3.6M (Average MPJPE (mm) metric)

3D Absolute Human Pose Estimation 3D Pose Estimation

Paper
Add Code

Interpretable Convolutional Neural Networks

2 code implementations • CVPR 2018 • Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu

Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process.

Ranked #1 on single catogory classification on ILSVRC Part

Object single catogory classification

222

Paper
Code

Jointly Recognizing Object Fluents and Tasks in Egocentric Videos

no code implementations • ICCV 2017 • Yang Liu, Ping Wei, Song-Chun Zhu

Given an egocentric video, a beam search algorithm is applied to jointly recognizing the object fluents in each frame, and the task of the entire video.

Object

Paper
Add Code

Monocular 3D Human Pose Estimation by Predicting Depth on Joints

no code implementations • ICCV 2017 • Bruce Xiaohan Nie, Ping Wei, Song-Chun Zhu

This paper aims at estimating full-body 3D human poses from monocular images of which the biggest challenge is the inherent ambiguity introduced by lifting the 2D pose into 3D space.

Ranked #113 on 3D Human Pose Estimation on Human3.6M (PA-MPJPE metric)

Depth Estimation Depth Prediction +1

Paper
Add Code

Learning Energy-Based Models as Generative ConvNets via Multi-grid Modeling and Sampling

no code implementations • CVPR 2018 • Ruiqi Gao, Yang Lu, Junpei Zhou, Song-Chun Zhu, Ying Nian Wu

Within each iteration of our learning algorithm, for each observed training image, we generate synthesized images at multiple grids by initializing the finite-step MCMC sampling from a minimal 1 x 1 version of the training image.

Paper
Add Code

Scene-centric Joint Parsing of Cross-view Videos

no code implementations • 16 Sep 2017 • Hang Qi, Yuanlu Xu, Tao Yuan, Tianfu Wu, Song-Chun Zhu

The proposed joint parsing framework represents such correlations and constraints explicitly and generates semantic scene-centric parse graphs.

Video Understanding

Paper
Add Code

A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

no code implementations • CVPR 2018 • Yuanlu Xu, Lei Qin, Xiaobai Liu, Jianwen Xie, Song-Chun Zhu

We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluent and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e. g., from visible to invisible) and track humans in videos.

Visual Tracking

Paper
Add Code

Mining Deep And-Or Object Structures via Cost-Sensitive Question-Answer-Based Active Annotations

no code implementations • 13 Aug 2017 • Quanshi Zhang, Ying Nian Wu, Hao Zhang, Song-Chun Zhu

The loss is defined for nodes in all layers of the AOG, including the generative loss (measuring the likelihood of the images) and the discriminative loss (measuring the fitness to human answers).

Question Answering

Paper
Add Code

Interactively Transferring CNN Patterns for Part Localization

no code implementations • 5 Aug 2017 • Quanshi Zhang, Ruiming Cao, Shengming Zhang, Mark Redmonds, Ying Nian Wu, Song-Chun Zhu

In the scenario of one/multi-shot learning, conventional end-to-end learning strategies without sufficient supervision are usually not powerful enough to learn correct patterns from noisy signals.

Paper
Add Code

Interpreting CNN Knowledge via an Explanatory Graph

no code implementations • 5 Aug 2017 • Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, Song-Chun Zhu

Considering that each filter in a conv-layer of a pre-trained CNN usually represents a mixture of object parts, we propose a simple yet efficient method to automatically disentangles different part patterns from each filter, and construct an explanatory graph.

Object

Paper
Add Code

Predicting Human Activities Using Stochastic Grammar

no code implementations • ICCV 2017 • Siyuan Qi, Siyuan Huang, Ping Wei, Song-Chun Zhu

This paper presents a novel method to predict future human activities from partially observed RGB-D videos.

Activity Prediction

Paper
Add Code

Generative Hierarchical Learning of Sparse FRAME Models

no code implementations • CVPR 2017 • Jianwen Xie, Yifei Xu, Erik Nijkamp, Ying Nian Wu, Song-Chun Zhu

This paper proposes a method for generative learning of hierarchical random field models.

Clustering object-detection +1

Paper
Add Code

Mining Object Parts from CNNs via Active Question-Answering

no code implementations • CVPR 2017 • Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu

We use an active human-computer communication to incrementally grow such an AOG on the pre-trained CNN as follows.

Active Learning Object +1

Paper
Add Code

CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

no code implementations • CVPR 2017 • Tianmin Shu, Sinisa Todorovic, Song-Chun Zhu

This work is about recognizing human activities occurring in videos at distinct semantic levels, including individual actions, interactions, and group activities.

Ranked #11 on Group Activity Recognition on Volleyball

Group Activity Recognition

Paper
Add Code

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars

no code implementations • 1 Apr 2017 • Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms.

Benchmarking Object +2

Paper
Add Code

Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions

no code implementations • 1 Mar 2017 • Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu

In this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI).

Paper
Add Code

Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning

no code implementations • 14 Nov 2016 • Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu

This paper proposes a learning strategy that extracts object-part concepts from a pre-trained convolutional neural network (CNN), in an attempt to 1) explore explicit semantics hidden in CNN units and 2) gradually grow a semantically interpretable graphical model on the pre-trained CNN for hierarchical object understanding.

Paper
Add Code

Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration

no code implementations • EMNLP 2016 • Changsong Liu, Shaohua Yang, Sari Saba-Sadiya, Nishant Shukla, Yunzhong He, Song-Chun Zhu, Joyce Chai

Paper
Add Code

Cooperative Training of Descriptor and Generator Networks

no code implementations • 29 Sep 2016 • Jianwen Xie, Yang Lu, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu

Specifically, within each iteration of the cooperative learning algorithm, the generator model generates initial synthesized examples to initialize a finite-step MCMC that samples and trains the energy-based descriptor model.

Paper
Add Code

Alternating Back-Propagation for Generator Network

no code implementations • 28 Jun 2016 • Tian Han, Yang Lu, Song-Chun Zhu, Ying Nian Wu

This paper proposes an alternating back-propagation algorithm for learning the generator network model.

Paper
Add Code

Modeling and Inferring Human Intents and Latent Functional Objects for Trajectory Prediction

no code implementations • 24 Jun 2016 • Dan Xie, Tianmin Shu, Sinisa Todorovic, Song-Chun Zhu

This paper is about detecting functional objects and inferring human intentions in surveillance videos of public spaces.

Clustering Trajectory Prediction

Paper
Add Code

Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet

no code implementations • CVPR 2017 • Jianwen Xie, Song-Chun Zhu, Ying Nian Wu

We show that a spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns.

Paper
Add Code

Inferring Forces and Learning Human Utilities From Videos

no code implementations • CVPR 2016 • Yixin Zhu, Chenfanfu Jiang, Yibiao Zhao, Demetri Terzopoulos, Song-Chun Zhu

We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world objects, and introduce a learning framework that incorporates the concept of human utilities, which in our opinion provides a deeper and finer-grained account not only of object affordance but also of people's interaction with objects.

Motion Planning Robot Task Planning

Paper
Add Code

Multi-View People Tracking via Hierarchical Trajectory Composition

no code implementations • CVPR 2016 • Yuanlu Xu, Xiaobai Liu, Yang Liu, Song-Chun Zhu

This paper presents a hierarchical composition approach for multi-view object tracking.

Multi-Object Tracking Object

Paper
Add Code

Grounded Semantic Role Labeling

no code implementations • NAACL 2016 • Shaohua Yang, Qiaozi Gao, Changsong Liu, Caiming Xiong, Song-Chun Zhu, Joyce Y. Chai

Question Answering Semantic Role Labeling

Paper
Add Code

Attribute And-Or Grammar for Joint Parsing of Human Attributes, Part and Pose

no code implementations • 6 May 2016 • Se-Young Park, Bruce Xiaohan Nie, Song-Chun Zhu

The A-AOG model is an amalgamation of three traditional grammar formulations: (i) Phrase structure grammar representing the hierarchical decomposition of the human body from whole to parts; (ii) Dependency grammar modeling the geometric articulation by a kinematic graph of the body pose; and (iii) Attribute grammar accounting for the compatibility relations between different parts in the hierarchy so that their appearances follow a consistent style.

Attribute Human Detection +1

Paper
Add Code

Learning Social Affordance for Human-Robot Interaction

no code implementations • 13 Apr 2016 • Tianmin Shu, M. S. Ryoo, Song-Chun Zhu

In this paper, we present an approach for robot learning of social affordance from human activity videos.

Weakly-supervised Learning

Paper
Add Code

Recognizing Car Fluents from Video

no code implementations • CVPR 2016 • Bo Li, Tianfu Wu, Caiming Xiong, Song-Chun Zhu

Since there are no publicly related dataset, we collect and annotate a car fluent dataset consisting of car videos with diverse fluents.

Paper
Add Code

A Theory of Generative ConvNet

no code implementations • 10 Feb 2016 • Jianwen Xie, Yang Lu, Song-Chun Zhu, Ying Nian Wu

If we further assume that the non-linearity in the ConvNet is Rectified Linear Unit (ReLU) and the reference distribution is Gaussian white noise, then we obtain a generative ConvNet model that is unique among energy-based models: The model is piecewise Gaussian, and the means of the Gaussian pieces are defined by an auto-encoder, where the filters in the bottom-up encoding become the basis functions in the top-down decoding, and the binary activation variables detected by the filters in the bottom-up convolution process become the coefficients of the basis functions in the top-down deconvolution process.

Paper
Add Code

Joint Image-Text News Topic Detection and Tracking with And-Or Graph Representation

no code implementations • 15 Dec 2015 • Weixin Li, Jungseock Joo, Hang Qi, Song-Chun Zhu

The AOG embeds a context sensitive grammar that can describe the hierarchical composition of news topics by semantic elements about people involved, related places and what happened, and model contextual relationships between elements in the hierarchy.

Clustering

Paper
Add Code

A Restricted Visual Turing Test for Deep Scene and Event Understanding

no code implementations • 6 Dec 2015 • Hang Qi, Tianfu Wu, Mun-Wai Lee, Song-Chun Zhu

and a sequence of story-line based queries, the task is to provide answers either simply in binary form "true/false" (to a polar query) or in an accurate natural language description (to a non-polar query).

Question Answering Video Captioning +1

Paper
Add Code

Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose

no code implementations • ICCV 2015 • Se-Young Park, Song-Chun Zhu

In this paper, we are interested in developing compositional models to explicit representing pose, parts and attributes and tackling the tasks of attribute recognition, pose estimation and part localization jointly.

Attribute Human Parsing +1

Paper
Add Code

Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face

no code implementations • ICCV 2015 • Jungseock Joo, Francis F. Steen, Song-Chun Zhu

Secondly, our model can categorize the political party affiliations of politicians, i. e., Democrats vs. Republicans, with the accuracy of 62. 6% (male) and 60. 1% (female).

Paper
Add Code

Mining And-Or Graphs for Graph Matching and Object Discovery

no code implementations • ICCV 2015 • Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu

This paper reformulates the theory of graph mining on the technical basis of graph matching, and extends its scope of applications to computer vision.

Graph Matching Graph Mining +1

Paper
Add Code

Learning FRAME Models Using CNN Filters

no code implementations • 28 Sep 2015 • Yang Lu, Song-Chun Zhu, Ying Nian Wu

We explain that each learned model corresponds to a new CNN unit at a layer above the layer of filters employed by the model.

Paper
Add Code

Online Object Tracking, Learning and Parsing with And-Or Graphs

1 code implementation • CVPR 2014 • Tianfu Wu, Yang Lu, Song-Chun Zhu

In the former, our AOGTracker outperforms state-of-the-art tracking algorithms including two trackers based on deep convolutional network.

Object Tracking

Paper
Code

Joint Action Recognition and Pose Estimation From Video

no code implementations • CVPR 2015 • Bruce Xiaohan Nie, Caiming Xiong, Song-Chun Zhu

Action recognition and pose estimation from video are closely related tasks for understanding human motion, most methods, however, learn separate models and combine them sequentially.

Action Recognition Pose Estimation +2

Paper
Add Code

Joint Inference of Groups, Events and Human Roles in Aerial Videos

no code implementations • CVPR 2015 • Tianmin Shu, Dan Xie, Brandon Rothrock, Sinisa Todorovic, Song-Chun Zhu

This paper addresses a new problem of parsing low-resolution aerial videos of large spatial areas, in terms of 1) grouping, 2) recognizing events and 3) assigning roles to people engaged in events.

Paper
Add Code

Video Primal Sketch: A Unified Middle-Level Representation for Video

no code implementations • 10 Feb 2015 • Zhi Han, Zongben Xu, Song-Chun Zhu

This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME /MRF model reproducing feature statistics extracted from input video to implicitly represent textured motion, such as water and fire.

Paper
Add Code

Learning And-Or Models to Represent Context and Occlusion for Car Detection and Viewpoint Estimation

no code implementations • 29 Jan 2015 • Tianfu Wu, Bo Li, Song-Chun Zhu

Firstly, the structure of the And-Or model is learned with three components: (a) mining multi-car contextual patterns based on layouts of annotated single car bounding boxes, (b) mining occlusion configurations between single cars, and (c) learning different combinations of part visibility based on car 3D CAD simulation.

Viewpoint Estimation

Paper
Add Code

Mapping Energy Landscapes of Non-Convex Learning Problems

no code implementations • 2 Oct 2014 • Maria Pavlovskaia, Kewei Tu, Song-Chun Zhu

In many statistical learning problems, the target functions to be optimized are highly non-convex in various model spaces and thus are difficult to analyze.

Clustering

Paper
Add Code

Visual Persuasion: Inferring Communicative Intents of Images

no code implementations • CVPR 2014 • Jungseock Joo, Weixin Li, Francis F. Steen, Song-Chun Zhu

In this paper we introduce the novel problem of understanding visual persuasion.

Paper
Add Code

Learning Inhomogeneous FRAME Models for Object Patterns

no code implementations • CVPR 2014 • Jianwen Xie, Wenze Hu, Song-Chun Zhu, Ying Nian Wu

We investigate an inhomogeneous version of the FRAME (Filters, Random field, And Maximum Entropy) model and apply it to modeling object patterns.

Object

Paper
Add Code

Unsupervised Learning of Dictionaries of Hierarchical Compositional Models

no code implementations • CVPR 2014 • Jifeng Dai, Yi Hong, Wenze Hu, Song-Chun Zhu, Ying Nian Wu

Given a set of unannotated training images, a dictionary of such hierarchical templates are learned so that each training image can be represented by a small number of templates that are spatially translated, rotated and scaled versions of the templates in the learned dictionary.

Domain Adaptation Template Matching

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.