EST: Evaluating Scientific Thinking in Artificial Agents

no code implementations18 Jun 2022 Manjie Xu, Guangyuan Jiang, Chi Zhang, Song-Chun Zhu, Yixin Zhu

Such inefficacy of learning in scientific thinking calls for future research in building humanlike intelligence.

Causal Discovery Causal Inference

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

1 code implementation13 Jun 2022 Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, Ying Nian Wu

Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling.

MPI: Evaluating and Inducing Personality in Pre-trained Language Models

no code implementations20 May 2022 Guangyuan Jiang, Manjie Xu, Song-Chun Zhu, Wenjuan Han, Chi Zhang, Yixin Zhu

Further, given this evaluation framework, (3) how can we induce a certain personality in a fully controllable fashion?

Language Modelling

PartAfford: Part-level Affordance Discovery from 3D Objects

no code implementations28 Feb 2022 Chao Xu, Yixin Chen, He Wang, Song-Chun Zhu, Yixin Zhu, Siyuan Huang

We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization, without dense supervision.

Emergent Graphical Conventions in a Visual Communication Game

no code implementations28 Nov 2021 Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Song-Chun Zhu, Yixin Zhu

While recent studies of emergent communication primarily focus on symbolic languages, their settings overlook the graphical sketches existing in human communication; they do not account for the evolution process through which symbolic sign systems emerge in the trade-off between iconicity and symbolicity.

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

no code implementations25 Nov 2021 Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu

Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization.

Systematic Generalization

Unsupervised Foreground Extraction via Deep Region Competition

1 code implementation NeurIPS 2021 Peiyu Yu, Sirui Xie, Xiaojian Ma, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background.

Inductive Bias Semantic Segmentation

YouRefIt: Embodied Reference Understanding with Language and Gesture

no code implementations ICCV 2021 Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang

To the best of our knowledge, this is the first embodied reference dataset that allows us to study referring expressions in daily physical scenes to understand referential behavior, human communication, and human-robot interaction.

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

1 code implementation ICCV 2021 Siyuan Huang, Yichen Xie, Song-Chun Zhu, Yixin Zhu

To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, occlusions, etc.

3D Object Detection 3D Point Cloud Classification +8

Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

1 code implementation5 Aug 2021 Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo

In a series of experiments, we demonstrate that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of-the-art methods.

Individual vs. Joint Perception: a Pragmatic Model of Pointing as Communicative Smithian Helping

no code implementations3 Jun 2021 Kaiwen Jiang, Stephanie Stacy, Chuyu Wei, Adelpha Chan, Federico Rossano, Yixin Zhu, Tao Gao

We add another agent as a guide who can only help by marking an observation already perceived by the hunter with a pointing or not, without providing new observations or offering any instrumental help.

Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

1 code implementation CVPR 2021 Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu

By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents.

Scene Understanding

Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments

1 code implementation30 Mar 2021 Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

In this paper, we rethink the problem of scene reconstruction from an embodied agent's perspective: While the classic view focuses on the reconstruction accuracy, our new perspective emphasizes the underlying functions and constraints such that the reconstructed scenes provide \em{actionable} information for simulating \em{interactions} with agents.

Common Sense Reasoning

ACRE: Abstract Causal REasoning Beyond Covariation

no code implementations CVPR 2021 Chi Zhang, Baoxiong Jia, Mark Edmonds, Song-Chun Zhu, Yixin Zhu

Causal induction, i. e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data.

Causal Discovery Visual Reasoning

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

no code implementations CVPR 2021 Chi Zhang, Baoxiong Jia, Song-Chun Zhu, Yixin Zhu

To fill in this gap, we propose a neuro-symbolic Probabilistic Abduction and Execution (PrAE) learner; central to the PrAE learner is the process of probabilistic abduction and execution on a probabilistic scene representation, akin to the mental manipulation of objects.

Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance

1 code implementation26 Mar 2021 Xu Xie, Chi Zhang, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Predicting agents' future trajectories plays a crucial role in modern AI systems, yet it is challenging due to intricate interactions exhibited in multi-agent systems, especially when it comes to collision avoidance.

Trajectory Prediction

A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics

no code implementations2 Mar 2021 Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Inspired by humans' remarkable ability to master arithmetic and generalize to unseen problems, we present a new dataset, HINT, to study machines' capability of learning generalizable concepts at three different levels: perception, syntax, and semantics.

Program Synthesis Systematic Generalization

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

no code implementations22 Feb 2021 Sirui Xie, Xiaojian Ma, Peiyu Yu, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Leveraging these concepts, they could understand the internal structure of this task, without seeing all of the problem instances.

Incorporating Vision Bias into Click Models for Image-oriented Search Engine

no code implementations7 Jan 2021 Ningxin Xu, Cheng Yang, Yixin Zhu, Xiaowei Hu, Changhu Wang

Most typical click models assume that the probability of a document to be examined by users only depends on position, such as PBM and UBM.

Learning Algebraic Representation for Abstract Spatial-Temporal Reasoning

no code implementations1 Jan 2021 Chi Zhang, Sirui Xie, Baoxiong Jia, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

We further show that the algebraic representation learned can be decoded by isomorphism and used to generate an answer.

Systematic Generalization

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

no code implementations ECCV 2020 Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence.

Action Recognition Action Understanding +1

Congestion-aware Evacuation Routing using Augmented Reality Devices

no code implementations25 Apr 2020 Zeyu Zhang, Hangxin Liu, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu

We present a congestion-aware routing solution for indoor evacuation, which produces real-time individual-customized evacuation routes among multiple destinations while keeping tracks of all evacuees' locations.

Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

no code implementations25 Apr 2020 Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu

Aiming to understand how human (false-)belief--a core socio-cognitive ability--would affect human interactions with robots, this paper proposes to adopt a graphical model to unify the representation of object states, robot knowledge, and human (false-)beliefs.

Object Tracking

Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

2 code implementations25 Apr 2020 Wenhe Zhang, Chi Zhang, Yixin Zhu, Song-Chun Zhu

To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG).

Relational Reasoning Visual Reasoning

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

no code implementations20 Apr 2020 Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Joshua B. Tenenbaum, Song-Chun Zhu

We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning.

Common Sense Reasoning Small Data Image Classification

Lagrangian-Eulerian Multi-Density Topology Optimization with the Material Point Method

2 code implementations2 Mar 2020 Yue Li, Xuan Li, Minchen Li, Yixin Zhu, Bo Zhu, Chenfanfu Jiang

A quadrature-level connectivity graph-based method is adopted to avoid the artificial checkerboard issues commonly existing in multi-resolution topology optimization methods.

Computational Physics Computational Engineering, Finance, and Science Graphics

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

no code implementations NeurIPS 2019 Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate.

Monocular 3D Object Detection object-detection

Learning Perceptual Inference by Contrasting

1 code implementation NeurIPS 2019 Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

"Thinking in pictures," [1] i. e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development.

Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning

no code implementations25 Nov 2019 Mark Edmonds, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

Given these general theories, the goal is to train an agent by interactively exploring the problem space to (i) discover, form, and transfer useful abstract and structural knowledge, and (ii) induce useful knowledge from the instance-level attributes observed in the environment.

Transfer Learning

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

no code implementations ICCV 2019 Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction---3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.

3D Human Pose Estimation Human-Object Interaction Detection +1

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

no code implementations CVPR 2019 Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu

In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation.

Computer Vision Object Recognition +3

MetaStyle: Three-Way Trade-Off Among Speed, Flexibility, and Quality in Neural Style Transfer

no code implementations13 Dec 2018 Chi Zhang, Yixin Zhu, Song-Chun Zhu

An unprecedented booming has been witnessed in the research area of artistic style transfer ever since Gatys et al. introduced the neural method.

Bilevel Optimization Style Transfer

Human-centric Indoor Scene Synthesis Using Stochastic Grammar

1 code implementation CVPR 2018 Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, Song-Chun Zhu

We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with perfect per-pixel ground truth.

Indoor Scene Synthesis

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

1 code implementation ECCV 2018 Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu

We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.

Monocular 3D Object Detection object-detection +4

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars

no code implementations1 Apr 2017 Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms.

Computer Vision Scene Understanding +1

Inferring Forces and Learning Human Utilities From Videos

no code implementations CVPR 2016 Yixin Zhu, Chenfanfu Jiang, Yibiao Zhao, Demetri Terzopoulos, Song-Chun Zhu

We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world objects, and introduce a learning framework that incorporates the concept of human utilities, which in our opinion provides a deeper and finer-grained account not only of object affordance but also of people's interaction with objects.

Motion Planning Robot Task Planning

Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition

no code implementations CVPR 2015 Yixin Zhu, Yibiao Zhao, Song Chun Zhu

In this paper, we present a new framework - task-oriented modeling, learning and recognition which aims at understanding the underlying functions, physics and causality in using objects as "tools".

Object Recognition

