Search Results for author: Baoxiong Jia

Found 21 papers, 12 papers with code

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

no code implementations16 Apr 2024 Peiyuan Zhi, Zhiyuan Zhang, Muzhi Han, Zeyu Zhang, Zhitian Li, Ziyuan Jiao, Baoxiong Jia, Siyuan Huang

Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback.

Instruction Following Multimodal Reasoning +1

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

no code implementations15 Apr 2024 Yandan Yang, Baoxiong Jia, Peiyuan Zhi, Siyuan Huang

With recent developments in Embodied Artificial Intelligence (EAI) research, there has been a growing demand for high-quality, large-scale interactive scene generation.

Scene Generation

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

1 code implementation26 Mar 2024 Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang

Despite significant advancements in text-to-motion synthesis, generating language-guided human motion within 3D environments poses substantial challenges.

Motion Synthesis

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

no code implementations17 Jan 2024 Baoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, Siyuan Huang

In comparison to recent advancements in the 2D domain, grounding language in 3D scenes faces several significant challenges: (i) the inherent complexity of 3D scenes due to the diverse object configurations, their rich attributes, and intricate relationships; (ii) the scarcity of paired 3D vision-language data to support grounded learning; and (iii) the absence of a unified learning framework to distill knowledge from grounded 3D data.

Scene Understanding Visual Grounding

An Embodied Generalist Agent in 3D World

1 code implementation18 Nov 2023 Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

Leveraging massive knowledge and learning schemes from large language models (LLMs), recent machine learning models show notable successes in building generalist agents that exhibit the capability of general-purpose task solving in diverse domains, including natural language processing, computer vision, and robotics.

3D dense captioning Question Answering +3

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

1 code implementation ICCV 2023 Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu

Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in infancy.

Improving Object-centric Learning with Query Optimization

2 code implementations17 Oct 2022 Baoxiong Jia, Yu Liu, Siyuan Huang

The ability to decompose complex natural scenes into meaningful object-centric abstractions lies at the core of human perception and reasoning.

Image Segmentation Object +3

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

1 code implementation8 Oct 2022 Baoxiong Jia, Ting Lei, Song-Chun Zhu, Siyuan Huang

The challenges of such capability lie in the difficulty of generating a detailed understanding of situated actions, their effects on object states (i. e., state changes), and their causal dependencies.

Action Localization counterfactual +4

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

2 code implementations13 Jun 2022 Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, Ying Nian Wu

Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling.

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

no code implementations25 Nov 2021 Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu

Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization.

Abstract Algebra Systematic Generalization

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

no code implementations CVPR 2021 Chi Zhang, Baoxiong Jia, Song-Chun Zhu, Yixin Zhu

To fill in this gap, we propose a neuro-symbolic Probabilistic Abduction and Execution (PrAE) learner; central to the PrAE learner is the process of probabilistic abduction and execution on a probabilistic scene representation, akin to the mental manipulation of objects.

Attribute Logical Reasoning

ACRE: Abstract Causal REasoning Beyond Covariation

no code implementations CVPR 2021 Chi Zhang, Baoxiong Jia, Mark Edmonds, Song-Chun Zhu, Yixin Zhu

Causal induction, i. e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data.

Blocking Causal Discovery +1

Learning Algebraic Representation for Abstract Spatial-Temporal Reasoning

no code implementations1 Jan 2021 Chi Zhang, Sirui Xie, Baoxiong Jia, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

We further show that the algebraic representation learned can be decoded by isomorphism and used to generate an answer.

Abstract Algebra Systematic Generalization

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

1 code implementation ECCV 2020 Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence.

Action Recognition Action Understanding +3

Learning Perceptual Inference by Contrasting

1 code implementation NeurIPS 2019 Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

"Thinking in pictures," [1] i. e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development.

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

no code implementations CVPR 2019 Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu

In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation.

Object Recognition Question Answering +2

Learning Human-Object Interactions by Graph Parsing Neural Networks

1 code implementation ECCV 2018 Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.

Human-Object Interaction Detection Object

Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction

no code implementations ICML 2018 Siyuan Qi, Baoxiong Jia, Song-Chun Zhu

Future predictions on sequence data (e. g., videos or audios) require the algorithms to capture non-Markovian and compositional properties of high-level semantics.

Activity Prediction Future prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.