no code implementations • EMNLP 2021 • Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut
One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role.
no code implementations • EMNLP 2021 • Arjun Akula, Spandana Gella, Keze Wang, Song-Chun Zhu, Siva Reddy
Our model outperforms the state-of-the-art NMN model on CLEVR-Ref+ dataset with +8. 1% improvement in accuracy on the single-referent test set and +4. 3% on the full test set.
no code implementations • SIGDIAL (ACL) 2022 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent’s mental state transition and teach the agent to make decisions guided by its value like a human.
1 code implementation • 6 Nov 2024 • Yizhe Huang, Xingbo Wang, Hao liu, Fanqi Kong, Aoyang Qin, Min Tang, Xiaoxi Wang, Song-Chun Zhu, Mingjie Bi, Siyuan Qi, Xue Feng
As agents progress, the environment adaptively generates new tasks with social structures for agents to undertake.
no code implementations • 10 Oct 2024 • Fanqi Kong, Yizhe Huang, Song-Chun Zhu, Siyuan Qi, Xue Feng
LASE allocates a portion of its rewards to co-players as gifts, with this allocation adapting dynamically based on the social relationship -- a metric evaluating the friendliness of co-players estimated by counterfactual reasoning.
no code implementations • 10 Oct 2024 • Xiaojuan Tang, Jiaqi Li, Yitao Liang, Song-Chun Zhu, Muhan Zhang, Zilong Zheng
In this paper, we design Mars, an interactive environment devised for situated inductive reasoning.
no code implementations • 9 Oct 2024 • Zeyu Zhang, Sixu Yan, Muzhi Han, Zaijin Wang, Xinggang Wang, Song-Chun Zhu, Hangxin Liu
We propose M^3Bench, a new benchmark of whole-body motion generation for mobile manipulation tasks.
no code implementations • 16 Jul 2024 • Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li
Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction.
1 code implementation • 24 Jun 2024 • Zixia Jia, Mengmeng Wang, Baichen Tong, Song-Chun Zhu, Zilong Zheng
Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs.
no code implementations • 12 Jun 2024 • Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng
To address these issues, we propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm that enables few-shot adaptation to unseen policies in mixed-motive environments.
no code implementations • 3 May 2024 • Qian Long, Fangwei Zhong, Mingdong Wu, Yizhou Wang, Song-Chun Zhu
Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks.
1 code implementation • 26 Apr 2024 • Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation.
no code implementations • 25 Apr 2024 • Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
In this paper, we introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations.
1 code implementation • 18 Mar 2024 • Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu
Through a series of simulations in a box-packing domain, we quantitatively demonstrate the effectiveness of LLM^3 in solving TAMP problems and the efficiency in selecting action parameters.
no code implementations • 4 Feb 2024 • Long Ma, Yuanfei Wang, Fangwei Zhong, Song-Chun Zhu, Yizhou Wang
To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently, as this is the prerequisite for carrying out the best response in adaptation.
no code implementations • 26 Jan 2024 • Zhenliang Zhang, Zeyu Zhang, Ziyuan Jiao, Yao Su, Hangxin Liu, Wei Wang, Song-Chun Zhu
Artificial intelligence (AI) has revolutionized human cognitive abilities and facilitated the development of new AI entities capable of interacting with humans in both physical and virtual environments.
1 code implementation • 19 Jan 2024 • Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu
Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning.
no code implementations • CVPR 2024 • Zhi Gao, Yuntao Du, Xintong Zhang, Xiaojian Ma, Wenjuan Han, Song-Chun Zhu, Qing Li
However, these methods often overlook the potential for continual learning, typically by freezing the utilized tools, thus limiting their adaptation to environments requiring new knowledge.
no code implementations • 9 Dec 2023 • Zhou Ziheng, YingNian Wu, Song-Chun Zhu, Demetri Terzopoulos
We introduce Aligner, a novel Parameter-Efficient Fine-Tuning (PEFT) method for aligning multi-billion-parameter-sized Large Language Models (LLMs).
1 code implementation • 18 Nov 2023 • Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang
However, several significant challenges remain: (i) most of these models rely on 2D images yet exhibit a limited capacity for 3D input; (ii) these models rarely explore the tasks inherently defined in 3D world, e. g., 3D grounding, embodied reasoning and acting.
no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao
The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.
1 code implementation • 16 Oct 2023 • Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun Zhu, Yizhou Wang
We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems.
Ranked #1 on Visual Reasoning on Bongard-OpenWorld
1 code implementation • NeurIPS 2023 • Peiyu Yu, Yaxuan Zhu, Sirui Xie, Xiaojian Ma, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu
To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
no code implementations • 23 Sep 2023 • Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, LiWei Wang, Zhouchen Lin, Song-Chun Zhu
With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.
no code implementations • 18 Sep 2023 • Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao
Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration.
no code implementations • 22 Aug 2023 • Ceyao Zhang, Kaijie Yang, Siyi Hu, ZiHao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang
Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems.
1 code implementation • ICCV 2023 • Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu
Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in infancy.
no code implementations • 7 Jul 2023 • Yuxi Ma, Chi Zhang, Song-Chun Zhu
In this perspective paper, we first comprehensively review existing evaluations of Large Language Models (LLMs) using both standardized tests and ability-oriented benchmarks.
no code implementations • 27 Jun 2023 • Shuwen Qiu, Mingdian Liu, Hengli Li, Song-Chun Zhu, Zilong Zheng
Experiments show that models with mind modeling can achieve higher task outcomes when aligning and negotiating common ground.
2 code implementations • 26 May 2023 • Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun Zhu, Shuguang Cui, Yaodong Yang
We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values.
1 code implementation • 24 May 2023 • Xiaojuan Tang, Zilong Zheng, Jiaqi Li, Fanxu Meng, Song-Chun Zhu, Yitao Liang, Muhan Zhang
On the whole, our analysis provides a novel perspective on the role of semantics in developing and evaluating language models' reasoning abilities.
1 code implementation • NeurIPS 2023 • Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao
At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response.
1 code implementation • ICCV 2023 • Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang
To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes.
no code implementations • 10 Mar 2023 • Weiqi Wang, Zihang Zhao, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu, Hangxin Liu
We present an optimization-based framework for rearranging indoor furniture to accommodate human-robot co-activities better.
2 code implementations • CVPR 2023 • Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu
SceneDiffuser provides a unified model for solving scene-conditioned generation, optimization, and planning.
no code implementations • 14 Jan 2023 • Hangxin Liu, Zeyu Zhang, Ziyuan Jiao, Zhenliang Zhang, Minchen Li, Chenfanfu Jiang, Yixin Zhu, Song-Chun Zhu
In this work, we present a reconfigurable data glove design to capture different modes of human hand-object interactions, which are critical in training embodied artificial intelligence (AI) agents for fine manipulation tasks.
1 code implementation • 20 Nov 2022 • Yu-Zhe Shi, Manjie Xu, John E. Hopcroft, Kun He, Joshua B. Tenenbaum, Song-Chun Zhu, Ying Nian Wu, Wenjuan Han, Yixin Zhu
Specifically, at the $representational \ level$, we seek to answer how the complexity varies when a visual concept is mapped to the representation space.
1 code implementation • 29 Oct 2022 • Mitch Hill, Erik Nijkamp, Jonathan Mitchell, Bo Pang, Song-Chun Zhu
This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM).
1 code implementation • 24 Oct 2022 • Xiaojuan Tang, Song-Chun Zhu, Yitao Liang, Muhan Zhang
In this paper, we propose a novel and principled framework called \textbf{RulE} (stands for {Rul}e {E}mbedding) to effectively leverage logical rules to enhance KG reasoning.
1 code implementation • 14 Oct 2022 • Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, Siyuan Huang
We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D).
Ranked #1 on Referring Expression on SQA3D
1 code implementation • 8 Oct 2022 • Baoxiong Jia, Ting Lei, Song-Chun Zhu, Siyuan Huang
The challenges of such capability lie in the difficulty of generating a detailed understanding of situated actions, their effects on object states (i. e., state changes), and their causal dependencies.
no code implementations • 4 Oct 2022 • Qing Li, Yixin Zhu, Yitao Liang, Ying Nian Wu, Song-Chun Zhu, Siyuan Huang
We evaluate NSR's efficacy across four challenging benchmarks designed to probe systematic generalization capabilities: SCAN for semantic parsing, PCFG for string manipulation, HINT for arithmetic reasoning, and a compositional machine translation task.
2 code implementations • 29 Sep 2022 • Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan
However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.
1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan
We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.
Ranked #6 on Science Question Answering on ScienceQA
1 code implementation • 10 Jul 2022 • Ziyuan Jiao, Yida Niu, Zeyu Zhang, Song-Chun Zhu, Yixin Zhu, Hangxin Liu
We devise a 3D scene graph representation, contact graph+ (cg+), for efficient sequential task planning.
no code implementations • 30 Jun 2022 • Zeyu Zhang, Ziyuan Jiao, Weiqi Wang, Yixin Zhu, Song-Chun Zhu, Hangxin Liu
We present a robot learning and planning framework that produces an effective tool-use strategy with the least joint efforts, capable of handling objects different from training.
no code implementations • 23 Jun 2022 • Yizhou Zhao, Steven Gong, Xiaofeng Gao, Wensi Ai, Song-Chun Zhu
With the recent progress of simulations by 3D modeling software and game engines, many researchers have focused on Embodied AI tasks in the virtual environment.
1 code implementation • 17 Jun 2022 • Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Yiran Geng, Hao Dong, Zongqing Lu, Song-Chun Zhu, Yaodong Yang
In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects.
2 code implementations • 13 Jun 2022 • Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, Ying Nian Wu
Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling.
1 code implementation • CVPR 2022 • Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar
A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts.
Ranked #1 on Few-Shot Image Classification on Bongard-HOI
1 code implementation • 24 May 2022 • Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu
This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories.
1 code implementation • ICLR 2022 • Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar
This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.
Ranked #1 on Zero-Shot Human-Object Interaction Detection on HICO
no code implementations • 9 Mar 2022 • Yizhou Zhao, Liang Qiu, Wensi Ai, Pan Lu, Song-Chun Zhu
We propose a Spatial-Temporal And-Or graph (ST-AOG), a stochastic grammar model, to encode the contextual relationship between motion, emotion, and relation, forming a triangle in a conditional random field.
no code implementations • 28 Feb 2022 • Chao Xu, Yixin Chen, He Wang, Song-Chun Zhu, Yixin Zhu, Siyuan Huang
We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization, without dense supervision.
no code implementations • 26 Jan 2022 • Arjun R Akula, Song-Chun Zhu
Motivated by this, we ask a follow-up question: "Assuming that we only consider the tasks where attention weights correlate well with feature importance, how effective are these attention based explanations in increasing human trust and reliance in the underlying models?".
no code implementations • 17 Jan 2022 • Arjun R Akula, Song-Chun Zhu
We also introduce DisNet, a novel dataset containing the proposed visual discourse annotations of 3000 videos and their paragraphs.
1 code implementation • 12 Dec 2021 • Yizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, Song-Chun Zhu
Current pre-training methods in computer vision focus on natural images in the daily-life context.
no code implementations • 12 Dec 2021 • Liang Qiu, Yizhou Zhao, Jinchao Li, Pan Lu, Baolin Peng, Jianfeng Gao, Song-Chun Zhu
To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems.
no code implementations • NeurIPS 2021 • Arjun Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu
Neural module networks (NMN) are a popular approach for solving multi-modal tasks such as visual question answering (VQA) and visual referring expression recognition (REF).
no code implementations • 28 Nov 2021 • Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Jungseock Joo, Song-Chun Zhu, Yixin Zhu
Humans communicate with graphical sketches apart from symbolic languages.
no code implementations • 25 Nov 2021 • Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu
Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization.
2 code implementations • NeurIPS 2021 • Peiyu Yu, Sirui Xie, Xiaojian Ma, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background.
1 code implementation • 25 Oct 2021 • Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.
Ranked #1 on Visual Question Answering (VQA) on IconQA
1 code implementation • NeurIPS 2021 • Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu
Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works.
no code implementations • 30 Sep 2021 • Luyao Yuan, Zipeng Fu, Linqi Zhou, Kexin Yang, Song-Chun Zhu
Currently, in the study of multiagent systems, the intentions of agents are usually ignored.
no code implementations • ICLR 2022 • Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu
However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by deep network, is highly multi-modal in the data space.
no code implementations • ICCV 2021 • Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang
To the best of our knowledge, this is the first embodied reference dataset that allows us to study referring expressions in daily physical scenes to understand referential behavior, human communication, and human-robot interaction.
1 code implementation • 3 Sep 2021 • Arjun R. Akula, Keze Wang, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Chai, Song-Chun Zhu
More concretely, our CX-ToM framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user.
1 code implementation • ICCV 2021 • Siyuan Huang, Yichen Xie, Song-Chun Zhu, Yixin Zhu
To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, occlusions, etc.
Ranked #4 on 3D Object Detection on SUN-RGBD
1 code implementation • 15 Jul 2021 • Feng Shi, Chonghan Lee, Liang Qiu, Yizhou Zhao, Tianyi Shen, Shivran Muralidhar, Tian Han, Song-Chun Zhu, Vijaykrishnan Narayanan
The cognitive system for human action and behavior has evolved into a deep learning regime, and especially the advent of Graph Convolution Networks has transformed the field in recent years.
1 code implementation • 15 Jul 2021 • Feng Shi, Chonghan Lee, Mohammad Khairul Bashar, Nikhil Shukla, Song-Chun Zhu, Vijaykrishnan Narayanan
Our model has a scale-free structure which could process varying size of instances.
no code implementations • ACL 2021 • Liang Qiu, Yuan Liang, Yizhou Zhao, Pan Lu, Baolin Peng, Zhou Yu, Ying Nian Wu, Song-Chun Zhu
Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly.
Ranked #5 on Dialog Relation Extraction on DialogRE
1 code implementation • ACL 2021 • Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu
We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS).
Ranked #1 on Mathematical Question Answering on GeoS
no code implementations • 4 May 2021 • Feng Shi, Ahren Yiqiao Jin, Song-Chun Zhu
As GNNs operate on non-Euclidean data, their irregular data access patterns cause considerable computational costs and overhead on conventional architectures, such as GPU and CPU.
1 code implementation • CVPR 2021 • Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu
By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents.
1 code implementation • CVPR 2021 • Yaxuan Zhu, Ruiqi Gao, Siyuan Huang, Song-Chun Zhu, Ying Nian Wu
Specifically, the camera pose and 3D scene are represented as vectors and the local camera movement is represented as a matrix operating on the vector of the camera pose.
1 code implementation • 30 Mar 2021 • Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu
In this paper, we rethink the problem of scene reconstruction from an embodied agent's perspective: While the classic view focuses on the reconstruction accuracy, our new perspective emphasizes the underlying functions and constraints such that the reconstructed scenes provide \em{actionable} information for simulating \em{interactions} with agents.
1 code implementation • 26 Mar 2021 • Xu Xie, Chi Zhang, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
Predicting agents' future trajectories plays a crucial role in modern AI systems, yet it is challenging due to intricate interactions exhibited in multi-agent systems, especially when it comes to collision avoidance.
no code implementations • CVPR 2021 • Chi Zhang, Baoxiong Jia, Mark Edmonds, Song-Chun Zhu, Yixin Zhu
Causal induction, i. e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data.
no code implementations • CVPR 2021 • Chi Zhang, Baoxiong Jia, Song-Chun Zhu, Yixin Zhu
To fill in this gap, we propose a neuro-symbolic Probabilistic Abduction and Execution (PrAE) learner; central to the PrAE learner is the process of probabilistic abduction and execution on a probabilistic scene representation, akin to the mental manipulation of objects.
1 code implementation • ICCV 2021 • Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang
In this work, we study grounded grammar induction of vision and language in a joint learning framework.
no code implementations • 12 Mar 2021 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent's mental state transition and teach the agent to make decisions guided by its value like a human.
no code implementations • 7 Mar 2021 • Jianwen Xie, Zilong Zheng, Xiaolin Fang, Song-Chun Zhu, Ying Nian Wu
This paper studies the unsupervised cross-domain translation problem by proposing a generative framework, in which the probability distribution of each domain is represented by a generative cooperative network that consists of an energy-based model and a latent variable model.
no code implementations • 6 Mar 2021 • Xiaofeng Gao, Luyao Yuan, Tianmin Shu, Hongjing Lu, Song-Chun Zhu
Our experiments with human participants demonstrate that a short calibration using REMP can effectively bridge the gap between what a non-expert user thinks a robot can reach and the ground truth.
no code implementations • 2 Mar 2021 • Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
We believe the HINT dataset and the experimental findings are of great interest to the learning community on systematic generalization.
no code implementations • 22 Feb 2021 • Sirui Xie, Xiaojian Ma, Peiyu Yu, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
Leveraging these concepts, they could understand the internal structure of this task, without seeing all of the problem instances.
no code implementations • 1 Jan 2021 • Chi Zhang, Sirui Xie, Baoxiong Jia, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
We further show that the algebraic representation learned can be decoded by isomorphism and used to generate an answer.
no code implementations • 1 Jan 2021 • Feng Shi, Chen Li, Shijie Bian, Yiqiao Jin, Ziheng Xu, Tian Han, Song-Chun Zhu
The Propositional Satisfiability Problem (SAT), and more generally, the Constraint Satisfaction Problem (CSP), are mathematical questions defined as finding an assignment to a set of objects that satisfies a series of constraints.
no code implementations • 27 Dec 2020 • Yining Hong, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, Song-Chun Zhu
Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability.
no code implementations • 25 Dec 2020 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
3D data that contains rich geometry information of objects and scenes is valuable for understanding 3D physical world.
1 code implementation • 19 Dec 2020 • Yining Hong, Qing Li, Daniel Ciao, Siyuan Huang, Song-Chun Zhu
To generate more diverse solutions, \textit{tree regularization} is applied to guide the efficient shrinkage and exploration of the solution space, and a \textit{memory buffer} is designed to track and save the discovered various fixes for each problem.
Ranked #1 on Math Word Problem Solving on Math23K (weakly-supervised metric)
no code implementations • 18 Nov 2020 • Yizhou Zhao, Song-Chun Zhu
We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights, which can be connected with prior knowledge, experience replay, and evolution process of the policy.
no code implementations • 12 Nov 2020 • Sirui Xie, Feng Gao, Song-Chun Zhu
Seeing that the proposed generalization problem has not been widely studied yet, we carefully define an evaluation protocol, with which we illustrate the effectiveness of MEIP on two proof-of-concept domains and one challenging task: learning to fold from demonstrations.
no code implementations • 28 Sep 2020 • Ruiqi Gao, Jianwen Xie, Xue-Xin Wei, Song-Chun Zhu, Ying Nian Wu
The grid cells in the mammalian medial entorhinal cortex exhibit striking hexagon firing patterns when the agent navigates in the open field.
1 code implementation • EMNLP 2020 • Liang Qiu, Yizhou Zhao, Weiyan Shi, Yuan Liang, Feng Shi, Tao Yuan, Zhou Yu, Song-Chun Zhu
Inducing a meaningful structural representation from one or a set of dialogues is a crucial but challenging task in computational linguistics.
1 code implementation • ECCV 2020 • Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu
Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence.
no code implementations • 24 Jul 2020 • Xiaofeng Gao, Ran Gong, Yizhou Zhao, Shu Wang, Tianmin Shu, Song-Chun Zhu
Thus, in this paper, we propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations, where the robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications based on its online Bayesian inference of the user's mental state.
Bayesian Inference Explainable Artificial Intelligence (XAI) +1
no code implementations • ECCV 2020 • Qing Li, Siyuan Huang, Yining Hong, Song-Chun Zhu
Humans can progressively learn visual concepts from easy to hard questions.
1 code implementation • NeurIPS 2021 • Ruiqi Gao, Jianwen Xie, Xue-Xin Wei, Song-Chun Zhu, Ying Nian Wu
In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector.
1 code implementation • NeurIPS 2020 • Bo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu
Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well.
no code implementations • 12 Jun 2020 • Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu
Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm.
1 code implementation • ICML 2020 • Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu
In this paper, we address these issues and close the loop of neural-symbolic learning by (1) introducing the \textbf{grammar} model as a \textit{symbolic prior} to bridge neural perception and symbolic reasoning, and (2) proposing a novel \textbf{back-search} algorithm which mimics the top-down human-like learning procedure to propagate the error through the symbolic reasoning module efficiently.
no code implementations • CVPR 2020 • Tian Han, Erik Nijkamp, Linqi Zhou, Bo Pang, Song-Chun Zhu, Ying Nian Wu
This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM).
1 code implementation • ICLR 2021 • Mitch Hill, Jonathan Mitchell, Song-Chun Zhu
Our contributions are 1) an improved method for training EBM's with realistic long-run MCMC samples, 2) an Expectation-Over-Transformation (EOT) defense that resolves theoretical ambiguities for stochastic defenses and from which the EOT attack naturally follows, and 3) state-of-the-art adversarial defense for naturally-trained classifiers and competitive defense compared to adversarially-trained classifiers on Cifar-10, SVHN, and Cifar-100.
1 code implementation • ACL 2020 • Arjun R. Akula, Spandana Gella, Yaser Al-Onaizan, Song-Chun Zhu, Siva Reddy
To measure the true progress of existing models, we split the test set into two sets, one which requires reasoning on linguistic structure and the other which doesn't.
no code implementations • 25 Apr 2020 • Zeyu Zhang, Hangxin Liu, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu
We present a congestion-aware routing solution for indoor evacuation, which produces real-time individual-customized evacuation routes among multiple destinations while keeping tracks of all evacuees' locations.
2 code implementations • 25 Apr 2020 • Wenhe Zhang, Chi Zhang, Yixin Zhu, Song-Chun Zhu
To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG).
no code implementations • 25 Apr 2020 • Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu
Aiming to understand how human (false-)belief--a core socio-cognitive ability--would affect human interactions with robots, this paper proposes to adopt a graphical model to unify the representation of object states, robot knowledge, and human (false-)beliefs.
no code implementations • 20 Apr 2020 • Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Joshua B. Tenenbaum, Song-Chun Zhu
We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning.
1 code implementation • CVPR 2021 • Jianwen Xie, Yifei Xu, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu
We propose a generative model of unordered point sets, such as point clouds, in the form of an energy-based model, where the energy function is parameterized by an input-permutation-invariant bottom-up neural network.
1 code implementation • 21 Jan 2020 • Luyao Yuan, Zipeng Fu, Jingyue Shen, Lu Xu, Junhong Shen, Song-Chun Zhu
Pragmatics studies how context can contribute to language meanings.
no code implementations • NeurIPS 2019 • Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate.
Ranked #2 on Monocular 3D Object Detection on SUN RGB-D (AP@0.15 (10 / PNet-30) metric)
no code implementations • ECCV 2020 • Erik Nijkamp, Bo Pang, Tian Han, Linqi Zhou, Song-Chun Zhu, Ying Nian Wu
Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables.
1 code implementation • NeurIPS 2019 • Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, Song-Chun Zhu
"Thinking in pictures," [1] i. e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development.
no code implementations • 26 Nov 2019 • Jianwen Xie, Ruiqi Gao, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu
Learning representations of data is an important problem in statistics and machine learning.
no code implementations • 26 Nov 2019 • Jianwen Xie, Ruiqi Gao, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu
To model the motions explicitly, it is natural for the model to be based on the motions or the displacement fields of the pixels.
no code implementations • 25 Nov 2019 • Mark Edmonds, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu, Song-Chun Zhu
Given these general theories, the goal is to train an agent by interactively exploring the problem space to (i) discover, form, and transfer useful abstract and structural knowledge, and (ii) induce useful knowledge from the instance-level attributes observed in the environment.
no code implementations • ICCV 2019 • Yuanlu Xu, Song-Chun Zhu, Tony Tung
We present DenseRaC, a novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image.
Ranked #82 on 3D Human Pose Estimation on MPI-INF-3DHP (using extra training data)
no code implementations • 26 Sep 2019 • Jianwen Xie, Song-Chun Zhu, Ying Nian Wu
We show that an energy-based spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns.
no code implementations • 25 Sep 2019 • Feng Shi, Yizhou Zhao, Ziheng Xu, Tianyang Liu, Song-Chun Zhu
Graph Neural Networks as a combination of Graph Signal Processing and Deep Convolutional Networks shows great power in pattern recognition in non-Euclidean domains.
no code implementations • 15 Sep 2019 • Arjun R. Akula, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu
We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations.
Action Recognition Explainable Artificial Intelligence (XAI) +2
no code implementations • CVPR 2020 • Xianglei Xing, Tianfu Wu, Song-Chun Zhu, Ying Nian Wu
To realize this AND-OR hierarchy in image synthesis, we learn a generator network that consists of the following two components: (i) Each layer of the hierarchy is represented by an over-complete set of convolutional basis functions.
1 code implementation • ICCV 2019 • Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu
This paper addresses a new problem of understanding human gaze communication in social videos from both atomic-level and event-level, which is significant for studying human social interactions.
no code implementations • ICCV 2019 • Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction---3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.
3D Human Pose Estimation Human-Object Interaction Detection +1
no code implementations • 25 Jul 2019 • Feng Shi, Ziheng Xu, Tao Yuan, Song-Chun Zhu
In this work, we propose a Highly Untangled Generative-model Engine for Edge-computing or HUGE2 for accelerating these two special convolutions on the edge-computing platform by decomposing the kernels and untangling these smaller convolutions by performing basic matrix multiplications.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2019 • Yuanlu Xu, Wenguan Wang, Xiaobai Liu, Jianwen Xie, Song-Chun Zhu
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation from a monocular RGB image.
Ranked #15 on 3D Human Pose Estimation on HumanEva-I
no code implementations • NeurIPS 2019 • Erik Nijkamp, Mitch Hill, Song-Chun Zhu, Ying Nian Wu
We treat this non-convergent short-run MCMC as a learned generator model or a flow model.
1 code implementation • CVPR 2019 • Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu
The answer to a given question is represented by a node with missing value.
Ranked #14 on Visual Dialog on VisDial v0.9 val
3 code implementations • 29 Mar 2019 • Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, Ying Nian Wu
On the other hand, ConvNet potentials learned with non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from observed images.
1 code implementation • 13 Mar 2019 • Xiaofeng Gao, Ran Gong, Tianmin Shu, Xu Xie, Shu Wang, Song-Chun Zhu
One of the main challenges of advancing task-oriented learning such as visual task planning and reinforcement learning is the lack of realistic and standardized environments for training and testing AI agents.
no code implementations • 13 Mar 2019 • Arjun R. Akula, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu
This paper presents an explainable AI (XAI) system that provides explanations for its predictions.
no code implementations • CVPR 2019 • Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu
In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation.
1 code implementation • 6 Mar 2019 • Arjun R. Akula, Song-Chun Zhu
We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video.
no code implementations • 7 Feb 2019 • Jianwen Xie, Zilong Zheng, Xiaolin Fang, Song-Chun Zhu, Ying Nian Wu
This paper studies the problem of learning the conditional distribution of a high-dimensional output given an input, where the output and input may belong to two different domains, e. g., the output is a photo image and the input is a sketch image.
no code implementations • 24 Jan 2019 • Ruiqi Gao, Jianwen Xie, Siyuan Huang, Yufan Ren, Song-Chun Zhu, Ying Nian Wu
This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1).
no code implementations • 20 Jan 2019 • Xianglei Xing, Song-Chun Zhu, Ying Nian Wu
We introduce an explainable generative model by applying sparse operation on the feature maps of the generator network.
no code implementations • 8 Jan 2019 • Quanshi Zhang, Xin Wang, Ying Nian Wu, Huilin Zhou, Song-Chun Zhu
This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part.
no code implementations • 8 Jan 2019 • Zenan Ling, Haotian Ma, Yu Yang, Robert C. Qiu, Song-Chun Zhu, Quanshi Zhang
In this paper, we propose to disentangle and interpret contextual effects that are encoded in a pre-trained deep neural network.
1 code implementation • 28 Dec 2018 • Tian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, Ying Nian Wu
This paper proposes the divergence triangle as a framework for joint training of generator model, energy-based model and inference model.
no code implementations • 27 Dec 2018 • Jianwen Xie, Ruiqi Gao, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu
The non-linear transformation of this transition model can be parametrized by a feedforward neural network.
no code implementations • 18 Dec 2018 • Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu
The AOG associates each object part with certain neural units in feature maps of conv-layers.
no code implementations • 18 Dec 2018 • Quanshi Zhang, Xin Wang, Ruiming Cao, Ying Nian Wu, Feng Shi, Song-Chun Zhu
This paper introduces a graphical model, namely an explanatory graph, which reveals the knowledge hierarchy hidden inside conv-layers of a pre-trained CNN.
no code implementations • 13 Dec 2018 • Chi Zhang, Yixin Zhu, Song-Chun Zhu
An unprecedented booming has been witnessed in the research area of artistic style transfer ever since Gatys et al. introduced the neural method.
no code implementations • 19 Nov 2018 • Tian Xu, Jiayu Zhan, Oliver G. B. Garrod, Philip H. S. Torr, Song-Chun Zhu, Robin A. A. Ince, Philippe G. Schyns
However, understanding the information represented and processed in CNNs remains in most cases challenging.
1 code implementation • NeurIPS 2018 • Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D.
Ranked #5 on Monocular 3D Object Detection on SUN RGB-D
1 code implementation • ICLR 2019 • Ruiqi Gao, Jianwen Xie, Song-Chun Zhu, Ying Nian Wu
In this model, the 2D self-position of the agent is represented by a high-dimensional vector, and the 2D self-motion or displacement of the agent is represented by a matrix that transforms the vector.
no code implementations • 9 Oct 2018 • Ying Nian Wu, Ruiqi Gao, Tian Han, Song-Chun Zhu
In this paper, we review three families of probability models, namely, the discriminative models, the descriptive models, and the generative models.
no code implementations • 3 Oct 2018 • Feng Shi, Haochen Li, Yuhe Gao, Benjamin Kuschner, Song-Chun Zhu
The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators.
no code implementations • 1 Oct 2018 • Tianmin Shu, Caiming Xiong, Ying Nian Wu, Song-Chun Zhu
In particular, the probing agent (i. e. a learner) learns to interact with the environment and with a target agent (i. e., a demonstrator) to maximize the change in the observed behaviors of that agent.
1 code implementation • CVPR 2018 • Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, Song-Chun Zhu
We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with perfect per-pixel ground truth.
1 code implementation • ECCV 2018 • Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu
For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.
Ranked #32 on Human-Object Interaction Detection on V-COCO
1 code implementation • ECCV 2018 • Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.
Ranked #4 on Monocular 3D Object Detection on SUN RGB-D (AP@0.15 (10 / PNet-30) metric)
2 code implementations • 16 Jun 2018 • Xianglei Xing, Ruiqi Gao, Tian Han, Song-Chun Zhu, Ying Nian Wu
We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner.
no code implementations • ICML 2018 • Siyuan Qi, Baoxiong Jia, Song-Chun Zhu
Future predictions on sequence data (e. g., videos or audios) require the algorithms to capture non-Markovian and compositional properties of high-level semantics.
no code implementations • CVPR 2018 • Ping Wei, Yang Liu, Tianmin Shu, Nanning Zheng, Song-Chun Zhu
We built a new video dataset of tasks, intentions, and attention.
no code implementations • CVPR 2018 • Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, Song-Chun Zhu
We collect a new dataset VideoCoAtt from public TV show videos, containing 380 complex video sequences with more than 492, 000 frames that include diverse social scenes for shared attention study.
1 code implementation • CVPR 2018 • Wenguan Wang, Yuanlu Xu, Jianbing Shen, Song-Chun Zhu
This paper proposes a knowledge-guided fashion network to solve the problem of visual fashion analysis, e. g., fashion landmark localization and clothing category classification.
no code implementations • 18 May 2018 • Quanshi Zhang, Yu Yang, Yuchen Liu, Ying Nian Wu, Song-Chun Zhu
Given feature maps of a certain conv-layer of the CNN, the explainer performs like an auto-encoder, which first disentangles the feature maps into object-part features and then inverts object-part features back to features of higher conv-layers of the CNN.
1 code implementation • CVPR 2018 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns.
no code implementations • 6 Mar 2018 • Siyuan Qi, Song-Chun Zhu
We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time.
Multi-agent Reinforcement Learning reinforcement-learning +2