Infrared Invisible Clothing:Hiding from Infrared Detectors at Multiple Angles in Real World

no code implementations12 May 2022 Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu

We simulated the process from cloth to clothing in the digital world and then designed the adversarial "QR code" pattern.

PartAfford: Part-level Affordance Discovery from 3D Objects

no code implementations28 Feb 2022 Chao Xu, Yixin Chen, He Wang, Song-Chun Zhu, Yixin Zhu, Siyuan Huang

We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization, without dense supervision.

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey

no code implementations6 Feb 2022 Keli Huang, Botian Shi, Xiang Li, Xin Li, Siyuan Huang, Yikang Li

Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers.

Autonomous Driving object-detection +2

YouRefIt: Embodied Reference Understanding with Language and Gesture

no code implementations ICCV 2021 Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang

To the best of our knowledge, this is the first embodied reference dataset that allows us to study referring expressions in daily physical scenes to understand referential behavior, human communication, and human-robot interaction.

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

1 code implementation ICCV 2021 Siyuan Huang, Yichen Xie, Song-Chun Zhu, Yixin Zhu

To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, occlusions, etc.

3D Object Detection 3D Point Cloud Classification +8

Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis

1 code implementation CVPR 2021 Yaxuan Zhu, Ruiqi Gao, Siyuan Huang, Song-Chun Zhu, Ying Nian Wu

Specifically, the camera pose and 3D scene are represented as vectors and the local camera movement is represented as a matrix operating on the vector of the camera pose.

Novel View Synthesis

A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics

no code implementations2 Mar 2021 Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Inspired by humans' remarkable ability to master arithmetic and generalize to unseen problems, we present a new dataset, HINT, to study machines' capability of learning generalizable concepts at three different levels: perception, syntax, and semantics.

Program Synthesis Systematic Generalization

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar

no code implementations27 Dec 2020 Yining Hong, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, Song-Chun Zhu

Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability.

Mathematical Reasoning

Learning by Fixing: Solving Math Word Problems with Weak Supervision

1 code implementation19 Dec 2020 Yining Hong, Qing Li, Daniel Ciao, Siyuan Huang, Song-Chun Zhu

To generate more diverse solutions, \textit{tree regularization} is applied to guide the efficient shrinkage and exploration of the solution space, and a \textit{memory buffer} is designed to track and save the discovered various fixes for each problem.

 Ranked #1 on Math Word Problem Solving on Math23K (weakly-supervised metric)

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

no code implementations ECCV 2020 Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence.

Action Recognition Action Understanding +1

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

1 code implementation ICML 2020 Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu

In this paper, we address these issues and close the loop of neural-symbolic learning by (1) introducing the \textbf{grammar} model as a \textit{symbolic prior} to bridge neural perception and symbolic reasoning, and (2) proposing a novel \textbf{back-search} algorithm which mimics the top-down human-like learning procedure to propagate the error through the symbolic reasoning module efficiently.

Question Answering Visual Question Answering

Memory-efficient training with streaming dimensionality reduction

no code implementations25 Apr 2020 Siyuan Huang, Brian D. Hoskins, Matthew W. Daniels, Mark D. Stiles, Gina C. Adam

The movement of large quantities of data during the training of a Deep Neural Network presents immense challenges for machine learning workloads.

Dimensionality Reduction

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

no code implementations20 Apr 2020 Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Joshua B. Tenenbaum, Song-Chun Zhu

We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning.

Common Sense Reasoning Small Data Image Classification

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

no code implementations NeurIPS 2019 Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate.

Monocular 3D Object Detection object-detection

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

1 code implementation ICCV 2019 Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu

This paper addresses a new problem of understanding human gaze communication in social videos from both atomic-level and event-level, which is significant for studying human social interactions.

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

no code implementations ICCV 2019 Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction---3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.

3D Human Pose Estimation Human-Object Interaction Detection +1

Streaming Batch Eigenupdates for Hardware Neuromorphic Networks

no code implementations5 Mar 2019 Brian D. Hoskins, Matthew W. Daniels, Siyuan Huang, Advait Madhavan, Gina C. Adam, Nikolai Zhitenev, Jabez J. McClelland, Mark D. Stiles

Neuromorphic networks based on nanodevices, such as metal oxide memristors, phase change memories, and flash memory cells, have generated considerable interest for their increased energy efficiency and density in comparison to graphics processing units (GPUs) and central processing units (CPUs).

Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion

no code implementations24 Jan 2019 Ruiqi Gao, Jianwen Xie, Siyuan Huang, Yufan Ren, Song-Chun Zhu, Ying Nian Wu

This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1).

Optical Flow Estimation

Human-centric Indoor Scene Synthesis Using Stochastic Grammar

1 code implementation CVPR 2018 Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, Song-Chun Zhu

We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with perfect per-pixel ground truth.

Indoor Scene Synthesis

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

1 code implementation ECCV 2018 Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu

We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.

Monocular 3D Object Detection object-detection +4

Predicting Human Activities Using Stochastic Grammar

no code implementations ICCV 2017 Siyuan Qi, Siyuan Huang, Ping Wei, Song-Chun Zhu

This paper presents a novel method to predict future human activities from partially observed RGB-D videos.

Activity Prediction

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars

no code implementations1 Apr 2017 Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms.

Computer Vision Scene Understanding +1

Nonlinear Local Metric Learning for Person Re-identification

no code implementations16 Nov 2015 Siyuan Huang, Jiwen Lu, Jie zhou, Anil K. Jain

In this paper, we propose a nonlinear local metric learning (NLML) method to improve the state-of-the-art performance of person re-identification on public datasets.

Metric Learning Person Re-Identification

Image Set Querying Based Localization

no code implementations20 Sep 2015 Lei Deng, Siyuan Huang, Yueqi Duan, Baohua Chen, Jie zhou

Conventional single image based localization methods usually fail to localize a querying image when there exist large variations between the querying image and the pre-built scene.

Image-Based Localization

