1 code implementation • 7 Jul 2025 • Wenyao Zhang, Hongsi Liu, Zekun Qi, Yunnan Wang, Xinqiang Yu, Jiazhao Zhang, Runpei Dong, JiaWei He, He Wang, Zhizheng Zhang, Li Yi, Wenjun Zeng, Xin Jin
However, existing methods are limited to challenging image-based forecasting, which suffers from redundant information and lacks comprehensive and critical world knowledge, including dynamic, spatial and semantic information.
Ranked #1 on
Robot Manipulation
on CALVIN
no code implementations • 11 Jun 2025 • Chen Gao, Liankai Jin, Xingyu Peng, Jiazhao Zhang, Yue Deng, Annan Li, He Wang, Si Liu
Thus, we aim to investigate how to achieve thinking-before-action in the embodied navigation field, to improve model's reasoning ability toward generalists.
no code implementations • 29 May 2025 • Shaoan Wang, Jiazhao Zhang, Minghan Li, Jiahang Liu, Anqi Li, Kui Wu, Fangwei Zhong, Junzhi Yu, Zhizheng Zhang, He Wang
Embodied visual tracking is a fundamental skill in Embodied AI, enabling an agent to follow a specific target in dynamic environments using only egocentric vision.
no code implementations • 13 May 2025 • Yuhang Huang, Jiazhao Zhang, SHilong Zou, Xinwang Liu, Ruizhen Hu, Kai Xu
To this end, we propose LaDi-WM, a world model that predicts the latent space of future states using diffusion modeling.
no code implementations • CVPR 2025 • Yijie Tang, Jiazhao Zhang, Yuqing Lan, Yulan Guo, Dezun Dong, Chenyang Zhu, Kai Xu
Online zero-shot 3D instance segmentation of a progressively reconstructed scene is both a critical and challenging task for embodied applications.
3D Instance Segmentation
open vocabulary 3d instance segmentation
+2
2 code implementations • 18 Feb 2025 • Zekun Qi, Wenyao Zhang, Yufei Ding, Runpei Dong, Xinqiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, JiaWei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, Li Yi
Spatial intelligence is a critical component of embodied AI, promoting robots to understand and interact with their environments.
Ranked #1 on
Spatial Reasoning
on EmbSpatial-Bench
1 code implementation • 11 Dec 2024 • Yihan Cao, Jiazhao Zhang, Zhinan Yu, Kai Xu
Consequently, most existing methods tackle this challenge by leveraging non-gradient-based optimization methods. In this work, we present a hybrid camera placement optimization approach that incorporates both gradient-based and non-gradient-based optimization methods.
no code implementations • 11 Dec 2024 • Yihan Cao, Jiazhao Zhang, Zhinan Yu, Shuzhen Liu, Zheng Qin, Qin Zou, Bo Du, Kai Xu
Object goal navigation (ObjectNav) is a fundamental task in embodied AI, requiring an agent to locate a target object in previously unseen environments.
no code implementations • 9 Dec 2024 • Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang, He Wang
A practical navigation agent must be capable of handling a wide range of interaction demands, such as following instructions, searching objects, answering questions, tracking people, and more.
1 code implementation • 27 Nov 2024 • Wenbo Cui, Chengyang Zhao, Songlin Wei, Jiazhao Zhang, Haoran Geng, Yaran Chen, Haoran Li, He Wang
To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomization and detailed annotations of part-oriented, scene-level actionable interaction poses.
1 code implementation • IROS2024 2024 • Yufei Ding, Haoran Geng, Chaoyi Xu, Xiaomeng Fang, Jiazhao Zhang, Songlin Wei, Qiyu Dai, Zhizheng Zhang, He Wang
In this work, we propel the pioneer construction of the benchmark and approach for table-top Open-instruction 6-DoF Object Rearrangement (Open6DOR).
Ranked #2 on
Object Rearrangement
on Open6DOR V2
no code implementations • 24 Feb 2024 • Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang
Vision-and-language navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions.
no code implementations • 19 Jan 2024 • Jiazhao Zhang, Ying Hung, Chung-Ching Lin, Zicheng Liu
To capture the conditional dependence between branching and nested parameters, a unified Bayesian optimization framework is proposed.
no code implementations • CVPR 2024 • Mi Yan, Jiazhao Zhang, Yan Zhu, He Wang
Through iterative clustering of masks showing high view consensus, we generate a series of clusters, each representing a distinct 3D instance.
3D Instance Segmentation
3D Open-Vocabulary Instance Segmentation
+6
no code implementations • 28 Sep 2023 • Songlin Wei, Jiazhao Zhang, Yang Wang, Fanbo Xiang, Hao Su, He Wang
Existing works rely on the independence assumption of points in the radiance field or the pixels in input views to obtain tractable forms of the probability density function.
no code implementations • 27 Sep 2023 • Jiazhao Zhang, Nandiraju Gireesh, Jilong Wang, Xiaomeng Fang, Chaoyi Xu, Weiguang Chen, Liu Dai, He Wang
Mobile manipulation constitutes a fundamental task for robotic assistants and garners significant attention within the robotics community.
no code implementations • 17 Aug 2023 • Yijie Tang, Jiazhao Zhang, Zhinan Yu, He Wang, Kai Xu
For the first time, randomized optimization is made possible in neural tracking with several key designs to the learning process, enabling efficient and robust tracking even under fast camera motions.
no code implementations • CVPR 2023 • Jiazhao Zhang, Liu Dai, Fanpeng Meng, Qingnan Fan, Xuelin Chen, Kai Xu, He Wang
However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost.
1 code implementation • 12 Oct 2022 • Qiyu Dai, Yan Zhu, Yiran Geng, Ciyu Ruan, Jiazhao Zhang, He Wang
In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras in sensing their geometry.
no code implementations • 24 Sep 2022 • Jiayi Chen, Mi Yan, Jiazhao Zhang, Yinzhen Xu, Xiaolong Li, Yijia Weng, Li Yi, Shuran Song, He Wang
We for the first time propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion.
no code implementations • 12 May 2021 • Jiazhao Zhang, Chenyang Zhu, Lintao Zheng, Kai Xu
We propose to tackle the difficulties of fast-motion camera tracking in the absence of inertial measurements using random optimization, in particular, the Particle Filter Optimization (PFO).
1 code implementation • CVPR 2020 • Jiazhao Zhang, Chenyang Zhu, Lintao Zheng, Kai Xu
Online semantic 3D segmentation in company with real-time RGB-D reconstruction poses special challenges such as how to perform 3D convolution directly over the progressively fused 3D geometric data, and how to smartly fuse information from frame to frame.
no code implementations • 18 Jun 2019 • Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner, Kai Xu
In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene.