Search Results for author: Yixin Zhu

Found 65 papers, 28 papers with code

GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning

no code implementations • Findings (ACL) 2021 • Zilong Zheng, Shuwen Qiu, Lifeng Fan, Yixin Zhu, Song-Chun Zhu

Paper
Add Code

PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

no code implementations • 4 Apr 2024 • Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Yixin Zhu, Guyue Zhou, Hao Dong, Hao Zhao

Robotic manipulation of ungraspable objects with two-finger grippers presents significant challenges due to the paucity of graspable features, while traditional pre-grasping techniques, which rely on repositioning objects and leveraging external aids like table edges, lack the adaptability across object categories and scenes.

Object

Paper
Add Code

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

1 code implementation • 26 Mar 2024 • Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang

Despite significant advancements in text-to-motion synthesis, generating language-guided human motion within 3D environments poses substantial challenges.

Motion Synthesis

Paper
Code

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

no code implementations • 19 Mar 2024 • Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang

Traditional approaches in physics-based motion generation, centered around imitation learning and reward shaping, often struggle to adapt to new scenarios.

Imitation Learning

Paper
Add Code

Zero-Shot Image Feature Consensus with Deep Functional Maps

no code implementations • 18 Mar 2024 • Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, Leonidas Guibas

We demonstrate that our technique yields correspondences that are not only smoother but also more accurate, with the possibility of better reflecting the knowledge embedded in the large-scale vision models that we are studying.

Paper
Add Code

Scaling Up Dynamic Human-Scene Interaction Modeling

no code implementations • 13 Mar 2024 • Nan Jiang, Zhiyuan Zhang, Hongjie Li, Xiaoxuan Ma, Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang

Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS dataset alongside a novel HSI motion synthesis method.

Motion Synthesis

Paper
Add Code

I-PHYRE: Interactive Physical Reasoning

no code implementations • 4 Dec 2023 • Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

Current evaluation protocols predominantly assess physical reasoning in stationary scenes, creating a gap in evaluating agents' abilities to interact with dynamic events.

Zero-shot Generalization

Paper
Add Code

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

no code implementations • 1 Nov 2023 • Yixin Chen, Junfeng Ni, Nan Jiang, Yaowei Zhang, Yixin Zhu, Siyuan Huang

Reconstructing detailed 3D scenes from single-view images remains a challenging task due to limitations in existing approaches, which primarily focus on geometric shape recovery, overlooking object appearances and fine shape details.

3D Object Reconstruction 3D Reconstruction +5

Paper
Add Code

SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation

no code implementations • 25 Oct 2023 • Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas

Central to SparseDFF is a feature refinement network, optimized with a contrastive loss between views and a point-pruning mechanism for feature continuity.

One-Shot Learning

Paper
Add Code

ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

1 code implementation • NeurIPS 2023 • Xiaoxuan Ma, Stephan P. Kaufhold, Jiajun Su, Wentao Zhu, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang

ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160, 500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels.

Action Detection Pose Estimation

Paper
Code

Grasp Multiple Objects with One Hand

1 code implementation • 24 Oct 2023 • Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang

The intricate kinematics of the human hand enable simultaneous grasping and manipulation of multiple objects, essential for tasks such as object transfer and in-hand manipulation.

Object

Paper
Code

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

1 code implementation • ICCV 2023 • Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu

Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in infancy.

Paper
Code

Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section

no code implementations • 13 Jul 2023 • Hongyi Zheng, Yixin Zhu, Lavender Yao Jiang, Kyunghyun Cho, Eric Karl Oermann

Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes.

Language Modelling

Paper
Add Code

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

2 code implementations • 9 Jun 2023 • Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Yixin Zhu, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Weijie Wu, Qianyu He, Rui Xu, Wenhao Huang, Jingping Liu, Zili Wang, Shusen Wang, Weiguo Zheng, Hongwei Feng, Yanghua Xiao

New Natural Langauge Process~(NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs).

Jurisprudence Management +1

Paper
Code

MEWL: Few-shot multimodal word learning with referential uncertainty

1 code implementation • 1 Jun 2023 • Guangyuan Jiang, Manjie Xu, Shiji Xin, Wei Liang, Yujia Peng, Chi Zhang, Yixin Zhu

To fill in this gap, we introduce the MachinE Word Learning (MEWL) benchmark to assess how machines learn word meaning in grounded visual scenes.

Paper
Code

STRAP: Structured Object Affordance Segmentation with Point Supervision

1 code implementation • 17 Apr 2023 • Leiyao Cui, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yixin Zhu

By label affinity, we refer to affordance segmentation as a multi-label prediction problem: A plate can be both holdable and containable.

Object Scene Understanding

Paper
Code

Rearrange Indoor Scenes for Human-Robot Co-Activity

no code implementations • 10 Mar 2023 • Weiqi Wang, Zihang Zhao, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

We present an optimization-based framework for rearranging indoor furniture to accommodate human-robot co-activities better.

Paper
Add Code

Diffusion-based Generation, Optimization, and Planning in 3D Scenes

2 code implementations • CVPR 2023 • Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu

SceneDiffuser provides a unified model for solving scene-conditioned generation, optimization, and planning.

Denoising Grasp Generation +2

314

Paper
Code

A Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps

no code implementations • 14 Jan 2023 • Hangxin Liu, Zeyu Zhang, Ziyuan Jiao, Zhenliang Zhang, Minchen Li, Chenfanfu Jiang, Yixin Zhu, Song-Chun Zhu

In this work, we present a reconfigurable data glove design to capture different modes of human hand-object interactions, which are critical in training embodied artificial intelligence (AI) agents for fine manipulation tasks.

Paper
Add Code

Full-Body Articulated Human-Object Interaction

1 code implementation • ICCV 2023 • Nan Jiang, Tengyu Liu, Zhexuan Cao, Jieming Cui, Zhiyuan Zhang, Yixin Chen, He Wang, Yixin Zhu, Siyuan Huang

By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions.

Action Recognition Human-Object Interaction Detection +3

Paper
Code

To think inside the box, or to think out of the box? Scientific discovery via the reciprocation of insights and concepts

no code implementations • 1 Dec 2022 • Yu-Zhe Shi, Manjie Xu, Wenjuan Han, Yixin Zhu

If scientific discovery is one of the main driving forces of human progress, insight is the fuel for the engine, which has long attracted behavior-level research to understand and model its underlying cognitive process.

Paper
Add Code

On the Complexity of Bayesian Generalization

1 code implementation • 20 Nov 2022 • Yu-Zhe Shi, Manjie Xu, John E. Hopcroft, Kun He, Joshua B. Tenenbaum, Song-Chun Zhu, Ying Nian Wu, Wenjuan Han, Yixin Zhu

Specifically, at the $representational \ level$, we seek to answer how the complexity varies when a visual concept is mapped to the representation space.

Attribute

Paper
Code

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

1 code implementation • 18 Oct 2022 • Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang

Learning to generate diverse scene-aware and goal-oriented human motions in 3D scenes remains challenging due to the mediocre characteristics of the existing datasets on Human-Scene Interaction (HSI); they only have limited scale/quality and lack semantics.

106

Paper
Code

Understanding Embodied Reference with Touch-Line Transformer

1 code implementation • 11 Oct 2022 • Yang Li, Xiaoxue Chen, Hao Zhao, Jiangtao Gong, Guyue Zhou, Federico Rossano, Yixin Zhu

Human studies have revealed that objects referred to or pointed to do not lie on the elbow-wrist line, a common misconception; instead, they lie on the so-called virtual touch line.

Paper
Code

On the Learning Mechanisms in Physical Reasoning

no code implementations • 5 Oct 2022 • Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

Taken together, the results on the challenging benchmark of PHYRE show that LfI is, if not better, as good as LfD for dynamics prediction.

Paper
Add Code

Neural-Symbolic Recursive Machine for Systematic Generalization

no code implementations • 4 Oct 2022 • Qing Li, Yixin Zhu, Yitao Liang, Ying Nian Wu, Song-Chun Zhu, Siyuan Huang

In experiments, NSR achieves state-of-the-art performance in three benchmarks from different domains: SCAN for semantic parsing, PCFG for string manipulation, and HINT for arithmetic reasoning.

Arithmetic Reasoning Semantic Parsing +1

Paper
Add Code

GenDexGrasp: Generalizable Dexterous Grasping

1 code implementation • 3 Oct 2022 • Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang

By leveraging the contact map as a hand-agnostic intermediate representation, GenDexGrasp efficiently generates diverse and plausible grasping poses with a high success rate and can transfer among diverse multi-fingered robotic hands.

Paper
Code

Sequential Manipulation Planning on Scene Graph

1 code implementation • 10 Jul 2022 • Ziyuan Jiao, Yida Niu, Zeyu Zhang, Song-Chun Zhu, Yixin Zhu, Hangxin Liu

We devise a 3D scene graph representation, contact graph+ (cg+), for efficient sequential task planning.

Stochastic Optimization valid

Paper
Code

Understanding Physical Effects for Effective Tool-use

no code implementations • 30 Jun 2022 • Zeyu Zhang, Ziyuan Jiao, Weiqi Wang, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

We present a robot learning and planning framework that produces an effective tool-use strategy with the least joint efforts, capable of handling objects different from training.

Motion Planning regression +1

Paper
Add Code

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

2 code implementations • 13 Jun 2022 • Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, Ying Nian Wu

Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling.

Paper
Code

PartAfford: Part-level Affordance Discovery from 3D Objects

no code implementations • 28 Feb 2022 • Chao Xu, Yixin Chen, He Wang, Song-Chun Zhu, Yixin Zhu, Siyuan Huang

We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization, without dense supervision.

Object

Paper
Add Code

Emergent Graphical Conventions in a Visual Communication Game

no code implementations • 28 Nov 2021 • Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Jungseock Joo, Song-Chun Zhu, Yixin Zhu

Humans communicate with graphical sketches apart from symbolic languages.

Paper
Add Code

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

no code implementations • 25 Nov 2021 • Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu

Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization.

Abstract Algebra Systematic Generalization

Paper
Add Code

Unsupervised Foreground Extraction via Deep Region Competition

2 code implementations • NeurIPS 2021 • Peiyu Yu, Sirui Xie, Xiaojian Ma, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background.

Image Segmentation Inductive Bias +1

Paper
Code

YouRefIt: Embodied Reference Understanding with Language and Gesture

no code implementations • ICCV 2021 • Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang

To the best of our knowledge, this is the first embodied reference dataset that allows us to study referring expressions in daily physical scenes to understand referential behavior, human communication, and human-robot interaction.

Paper
Add Code

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

1 code implementation • ICCV 2021 • Siyuan Huang, Yichen Xie, Song-Chun Zhu, Yixin Zhu

To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, occlusions, etc.

Ranked #4 on 3D Object Detection on SUN-RGBD

3D Object Detection 3D Point Cloud Classification +8

Paper
Code

Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

1 code implementation • 5 Aug 2021 • Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo

In a series of experiments, we demonstrate that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of-the-art methods.

Paper
Code

Individual vs. Joint Perception: a Pragmatic Model of Pointing as Communicative Smithian Helping

no code implementations • 3 Jun 2021 • Kaiwen Jiang, Stephanie Stacy, Chuyu Wei, Adelpha Chan, Federico Rossano, Yixin Zhu, Tao Gao

We add another agent as a guide who can only help by marking an observation already perceived by the hunter with a pointing or not, without providing new observations or offering any instrumental help.

Paper
Add Code

Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

1 code implementation • CVPR 2021 • Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu

By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents.

Scene Understanding

Paper
Code

Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments

1 code implementation • 30 Mar 2021 • Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

In this paper, we rethink the problem of scene reconstruction from an embodied agent's perspective: While the classic view focuses on the reconstruction accuracy, our new perspective emphasizes the underlying functions and constraints such that the reconstructed scenes provide \em{actionable} information for simulating \em{interactions} with agents.

Common Sense Reasoning

121

Paper
Code

ACRE: Abstract Causal REasoning Beyond Covariation

no code implementations • CVPR 2021 • Chi Zhang, Baoxiong Jia, Mark Edmonds, Song-Chun Zhu, Yixin Zhu

Causal induction, i. e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data.

Blocking Causal Discovery +1

Paper
Add Code

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

no code implementations • CVPR 2021 • Chi Zhang, Baoxiong Jia, Song-Chun Zhu, Yixin Zhu

To fill in this gap, we propose a neuro-symbolic Probabilistic Abduction and Execution (PrAE) learner; central to the PrAE learner is the process of probabilistic abduction and execution on a probabilistic scene representation, akin to the mental manipulation of objects.

Attribute Logical Reasoning

Paper
Add Code

Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance

1 code implementation • 26 Mar 2021 • Xu Xie, Chi Zhang, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Predicting agents' future trajectories plays a crucial role in modern AI systems, yet it is challenging due to intricate interactions exhibited in multi-agent systems, especially when it comes to collision avoidance.

Collision Avoidance Trajectory Prediction

Paper
Code

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics

no code implementations • 2 Mar 2021 • Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

We believe the HINT dataset and the experimental findings are of great interest to the learning community on systematic generalization.

Few-Shot Learning Program Synthesis +1

Paper
Add Code

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

no code implementations • 22 Feb 2021 • Sirui Xie, Xiaojian Ma, Peiyu Yu, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Leveraging these concepts, they could understand the internal structure of this task, without seeing all of the problem instances.

Paper
Add Code

Incorporating Vision Bias into Click Models for Image-oriented Search Engine

no code implementations • 7 Jan 2021 • Ningxin Xu, Cheng Yang, Yixin Zhu, Xiaowei Hu, Changhu Wang

Most typical click models assume that the probability of a document to be examined by users only depends on position, such as PBM and UBM.

Position

Paper
Add Code

Learning Algebraic Representation for Abstract Spatial-Temporal Reasoning

no code implementations • 1 Jan 2021 • Chi Zhang, Sirui Xie, Baoxiong Jia, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

We further show that the algebraic representation learned can be decoded by isomorphism and used to generate an answer.

Abstract Algebra Systematic Generalization

Paper
Add Code

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

1 code implementation • ECCV 2020 • Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence.

Action Recognition Action Understanding +3

Paper
Code

Congestion-aware Evacuation Routing using Augmented Reality Devices

no code implementations • 25 Apr 2020 • Zeyu Zhang, Hangxin Liu, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu

We present a congestion-aware routing solution for indoor evacuation, which produces real-time individual-customized evacuation routes among multiple destinations while keeping tracks of all evacuees' locations.

Paper
Add Code

Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

2 code implementations • 25 Apr 2020 • Wenhe Zhang, Chi Zhang, Yixin Zhu, Song-Chun Zhu

To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG).

Relational Reasoning Visual Reasoning

Paper
Code

Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

no code implementations • 25 Apr 2020 • Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu

Aiming to understand how human (false-)belief--a core socio-cognitive ability--would affect human interactions with robots, this paper proposes to adopt a graphical model to unify the representation of object states, robot knowledge, and human (false-)beliefs.

Object Object Tracking

Paper
Add Code

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

no code implementations • 20 Apr 2020 • Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Joshua B. Tenenbaum, Song-Chun Zhu

We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning.

Common Sense Reasoning Small Data Image Classification

Paper
Add Code

Lagrangian-Eulerian Multi-Density Topology Optimization with the Material Point Method

2 code implementations • 2 Mar 2020 • Yue Li, Xuan Li, Minchen Li, Yixin Zhu, Bo Zhu, Chenfanfu Jiang

A quadrature-level connectivity graph-based method is adopted to avoid the artificial checkerboard issues commonly existing in multi-resolution topology optimization methods.

Computational Physics Computational Engineering, Finance, and Science Graphics

Paper
Code

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

no code implementations • NeurIPS 2019 • Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate.

Ranked #2 on Monocular 3D Object Detection on SUN RGB-D (AP@0.15 (10 / PNet-30) metric)

Monocular 3D Object Detection Object +1

Paper
Add Code

Learning Perceptual Inference by Contrasting

1 code implementation • NeurIPS 2019 • Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

"Thinking in pictures," [1] i. e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development.

Paper
Code

Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning

no code implementations • 25 Nov 2019 • Mark Edmonds, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

Given these general theories, the goal is to train an agent by interactively exploring the problem space to (i) discover, form, and transfer useful abstract and structural knowledge, and (ii) induce useful knowledge from the instance-level attributes observed in the environment.

Reinforcement Learning (RL) Transfer Learning

Paper
Add Code

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

no code implementations • ICCV 2019 • Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction---3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.

3D Human Pose Estimation Human-Object Interaction Detection +1

Paper
Add Code

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

no code implementations • CVPR 2019 • Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu

In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation.

Object Recognition Question Answering +2

Paper
Add Code

MetaStyle: Three-Way Trade-Off Among Speed, Flexibility, and Quality in Neural Style Transfer

no code implementations • 13 Dec 2018 • Chi Zhang, Yixin Zhu, Song-Chun Zhu

An unprecedented booming has been witnessed in the research area of artistic style transfer ever since Gatys et al. introduced the neural method.

Bilevel Optimization Style Transfer

Paper
Add Code

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

1 code implementation • NeurIPS 2018 • Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D.

Ranked #5 on Monocular 3D Object Detection on SUN RGB-D

Monocular 3D Object Detection Object +4

100

Paper
Code

Human-centric Indoor Scene Synthesis Using Stochastic Grammar

1 code implementation • CVPR 2018 • Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, Song-Chun Zhu

We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with perfect per-pixel ground truth.

Indoor Scene Synthesis

Paper
Code

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

1 code implementation • ECCV 2018 • Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu

We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.

Ranked #4 on Monocular 3D Object Detection on SUN RGB-D (AP@0.15 (10 / PNet-30) metric)

Monocular 3D Object Detection Object +5

214

Paper
Code

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars

no code implementations • 1 Apr 2017 • Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms.

Benchmarking Object +2

Paper
Add Code

Inferring Forces and Learning Human Utilities From Videos

no code implementations • CVPR 2016 • Yixin Zhu, Chenfanfu Jiang, Yibiao Zhao, Demetri Terzopoulos, Song-Chun Zhu

We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world objects, and introduce a learning framework that incorporates the concept of human utilities, which in our opinion provides a deeper and finer-grained account not only of object affordance but also of people's interaction with objects.

Motion Planning Robot Task Planning

Paper
Add Code

Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition

no code implementations • CVPR 2015 • Yixin Zhu, Yibiao Zhao, Song Chun Zhu

In this paper, we present a new framework - task-oriented modeling, learning and recognition which aims at understanding the underlying functions, physics and causality in using objects as "tools".

Object Object Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.