no code implementations • CVPR 2025 • Yuxing Long, Jiyao Zhang, Mingjie Pan, Tianshu Wu, Taewhan Kim, Hao Dong
Furthermore, we propose the first manual-based manipulation planning model ManualPlan to set up a group of baselines for the CheckManual benchmark.
no code implementations • 6 Jun 2025 • Yan Shen, Ruihai Wu, Yubin Ke, Xinyuan Song, Zeyi Li, Xiaoqi Li, Hongwei Fan, Haoran Lu, Hao Dong
Shape assembly, the process of combining parts into a complete whole, is a crucial robotic skill with broad real-world applications.
no code implementations • 30 May 2025 • Mingxu Zhang, Xiaoqi Li, Jiahui Xu, Kaichen Zhou, Hojin Bae, Yan Shen, Chuyan Xiong, Jiaming Liu, Hao Dong
To address this, leveraging the power of single view 3D object reconstruction approaches, we propose a training free framework SR3D that enables robotic grasping of transparent and specular objects from a single view observation.
1 code implementation • 29 May 2025 • Hao Dong, Moru Liu, Jian Liang, Eleni Chatzi, Olga Fink
Vision-Language Models (VLMs) have demonstrated strong capabilities in aligning visual and textual modalities, enabling a wide range of applications in multimodal understanding and generation.
no code implementations • 28 May 2025 • Yuanfei Wang, Xinju Huang, Fangwei Zhong, Yaodong Yang, Yizhou Wang, Yuanpei Chen, Hao Dong
The ego agent must interact with this proxy user to infer and adapt to the user's latent desires.
no code implementations • 26 May 2025 • Zhuoheng Gao, Yihao Li, Jiyao Zhang, Rui Zhao, Tong Wu, Hao Tang, Zhaofei Yu, Hao Dong, Guozhang Chen, Tiejun Huang
To address this gap, we propose SpikeStereoNet, a brain-inspired framework and the first to estimate stereo depth directly from raw spike streams.
1 code implementation • 22 May 2025 • Ziyue Qiao, Qianyi Cai, Hao Dong, Jiawei Gu, Pengyang Wang, Meng Xiao, Xiao Luo, Hui Xiong
This paper addresses the challenge of graph domain adaptation on evolving, multiple out-of-distribution (OOD) graphs.
2 code implementations • 22 May 2025 • Moru Liu, Hao Dong, Jessica Kelly, Olga Fink, Mario Trapp
Out-of-distribution (OOD) detection and segmentation are crucial for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery.
no code implementations • 20 May 2025 • Hao Dong, Ziyue Qiao, Zhiyuan Ning, Qi Hao, Yi Du, Pengyang Wang, Yuanchun Zhou
However, they still have limitations: 1) In modeling subgraph semantic evolution, they usually neglect the internal structural interactions between subgraphs, which are actually crucial for encoding TKGs.
1 code implementation • 16 May 2025 • Yuran Wang, Ruihai Wu, Yue Chen, Jiarui Wang, Jiaqi Liang, Ziyu Zhu, Haoran Geng, Jitendra Malik, Pieter Abbeel, Hao Dong
To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO).
1 code implementation • 13 May 2025 • Dazhong Rong, Hao Dong, Xing Gao, Jiyu Wei, Di Hong, Yaoyao Hao, Qinming He, Yueming Wang
Based on the concept that ventral visual stream (VVS) mainly functions for object recognition, current unsupervised task-driven methods model VVS by contrastive learning, and have achieved good brain similarity.
no code implementations • 3 May 2025 • Xiaoqi Li, Jiaming Liu, Nuowei Han, Liang Heng, Yandong Guo, Hao Dong, Yang Liu
The 3D weakly-supervised visual grounding task aims to localize oriented 3D boxes in point clouds based on natural language descriptions without requiring annotations to guide model learning.
no code implementations • 10 Apr 2025 • Zen Kit Heng, Zimeng Zhao, Tianhao Wu, Yuanfei Wang, Mingdong Wu, Yangang Wang, Hao Dong
Large Language Models (LLMs) are emerging as promising tools for automated reinforcement learning (RL) reward design, owing to their robust capabilities in commonsense reasoning and code generation.
no code implementations • CVPR 2025 • Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao
As interest grows in world models that predict future states from current observations and actions, accurately modeling part-level dynamics has become increasingly relevant for various applications.
no code implementations • 18 Mar 2025 • Tianshu Wu, Jiyao Zhang, Shiqian Liang, Zhengxiao Han, Hao Dong
Recent learning-based robot pose estimation methods, while advancing online calibration, struggle with cross-robot generalization and require the robot to be fully visible.
no code implementations • CVPR 2025 • Ruihai Wu, Ziyu Zhu, Yuran Wang, Yue Chen, Jiarui Wang, Hao Dong
Cluttered garments manipulation poses significant challenges due to the complex, deformable nature of garments and intricate garment relations.
no code implementations • 16 Feb 2025 • Yuanfei Wang, Xiaojie Zhang, Ruihai Wu, Yu Li, Yan Shen, Mingdong Wu, Zhaofeng He, Yizhou Wang, Hao Dong
To enhance the diversity and complexity of adaptive manipulation mechanisms, we build a novel articulated object manipulation environment and equip it with 9 categories of objects.
no code implementations • 12 Feb 2025 • Yankai Fu, Qiuxuan Feng, Ning Chen, Zichen Zhou, Mengzhen Liu, Mingdong Wu, Tianxing Chen, Shanyu Rong, Jiaming Liu, Hao Dong, Shanghang Zhang
However, obtaining high-quality 3D representations presents two key problems: (1) the quality of point clouds captured by a single-view camera is significantly affected by factors such as camera resolution, positioning, and occlusions caused by the dexterous hand; (2) the global point clouds lack crucial contact information and spatial correspondences, which are necessary for fine-grained dexterous manipulation tasks.
1 code implementation • 5 Feb 2025 • Yuan Tian, Wenqi Zhou, Michele Viscione, Hao Dong, David Kammer, Olga Fink
Symbolic Regression (SR) holds great potential for uncovering underlying mathematical and physical relationships from observed data.
5 code implementations • 30 Jan 2025 • Hao Dong, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, Olga Fink
Besides, the recent advent of large-scale pre-trained multimodal foundation models, such as CLIP, has inspired works leveraging these models to enhance adaptation and generalization performances or adapting them to downstream tasks.
1 code implementation • 25 Jan 2025 • Hao Dong, Zheyuan Shi, Hemeng Zeng, Yongmei Liu
We define a class of domains called proper baggable domains, and show that for such domains, the BQNP problem got by our automatic method is a sound and complete abstraction for a generalized planning problem whose instances share the same bags with the given instance but the sizes of the bags might be different.
1 code implementation • 23 Jan 2025 • Hao Dong, Eleni Chatzi, Olga Fink
This task becomes more challenging when multiple modalities are involved.
no code implementations • CVPR 2025 • Han Sun, Yunkang Cao, Hao Dong, Olga Fink
Visual anomaly detection (AD) presents significant challenges due to the scarcity of anomalous data samples.
no code implementations • CVPR 2025 • Xiaoqi Li, Jingyun Xu, Mingxu Zhang, Jiaming Liu, Yan Shen, Iaroslav Ponomarenko, Jiahui Xu, Liang Heng, Siyuan Huang, Shanghang Zhang, Hao Dong
In robotic manipulation, task goals can be conveyed through various modalities, such as language, goal images, and goal videos.
no code implementations • 13 Dec 2024 • Taewhan Kim, Hojin Bae, Zeming Li, Xiaoqi Li, Iaroslav Ponomarenko, Ruihai Wu, Hao Dong
Visual actionable affordance has emerged as a transformative approach in robotics, focusing on perceiving interaction areas prior to manipulation.
no code implementations • 1 Dec 2024 • Hao Dong, Wei Wei
The rule-based approach relies on handcraft priority of dependency relation to reorder the context, while the score-based algorithm dynamically regulates the contextual sequence by calculating word position scores using neural network.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+2
1 code implementation • 19 Nov 2024 • Ismail Nejjar, Hao Dong, Olga Fink
Open-set Domain Adaptation (OSDA) aims to adapt a model from a labeled source domain to an unlabeled target domain, where novel classes - also referred to as target-private unknown classes - are present.
no code implementations • 15 Nov 2024 • Qi Hao, Runchang Liang, Yue Gao, Hao Dong, Wei Fan, Lu Jiang, Pengyang Wang
Variable Subset Forecasting (VSF) refers to a unique scenario in multivariate time series forecasting, where available variables in the inference phase are only a subset of the variables in the training phase.
2 code implementations • CVPR 2025 • Shawn Li, Huixian Gong, Hao Dong, Tiankai Yang, Zhengzhong Tu, Yue Zhao
Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection.
no code implementations • 6 Nov 2024 • Chenrui Tie, Yue Chen, Ruihai Wu, Boxuan Dong, Zeyi Li, Chongkai Gao, Hao Dong
We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process, thereby significantly improving training efficiency for trajectory-level SE(3) equivariant diffusion policy in an end-to-end manner.
1 code implementation • 2 Nov 2024 • Haoran Lu, Ruihai Wu, Yitong Li, Sijie Li, Ziyu Zhu, Chuanruo Ning, Yan Shen, Longzan Luo, Yuanpei Chen, Hao Dong
Recent successes in reinforcement learning and vision-based methods offer promising avenues for learning garment manipulation.
1 code implementation • 1 Oct 2024 • Kaichen Zhou, Yang Cao, Taewhan Kim, Hao Zhao, Hao Dong, Kai Ming Ting, Ye Zhu
To address this gap, we introduce the Realistic Anomaly Detection (RAD) dataset, the first multi-view RGB-based anomaly detection dataset specifically collected using a real robot arm, providing unique and realistic data scenarios.
no code implementations • 16 Aug 2024 • Shihan Peng, Hanyu Zhou, Hao Dong, Zhiwei Shi, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan
In this work, we introduce hybrid coaxial event-frame devices to build the multimodal system, and propose a coaxial stereo event camera (CoSEC) dataset for autonomous driving.
no code implementations • 4 Aug 2024 • Yue Chen, Chenrui Tie, Ruihai Wu, Hao Dong
Humans perceive and interact with the world with the awareness of equivariance, facilitating us in manipulating different objects in diverse poses.
no code implementations • 22 Jul 2024 • Kangqi Ma, Hao Dong, Yadong Mu
This paper addresses the challenge of robotic grasping of general objects.
no code implementations • 8 Jul 2024 • Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers
3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world.
1 code implementation • 1 Jul 2024 • Hao Dong, Eleni Chatzi, Olga Fink
In this work, we introduce a novel approach to address Multimodal Open-Set Domain Generalization (MM-OSDG) for the first time, utilizing self-supervision.
1 code implementation • 25 Jun 2024 • Zhuoqun Xu, Yang Liu, Xiaoqi Li, Jiyao Zhang, Hao Dong
This environment also includes autonomous human characters and robots with grasping and mobility capabilities, as well as a large number of interactive items.
1 code implementation • 24 Jun 2024 • Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang
To tackle this challenge, we propose a generic model for enabling the current GNN-based methods to preserve topology-free patterns.
1 code implementation • 19 Jun 2024 • Wenxiao Cai, Iaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao
Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI.
Ranked #4 on
Spatial Reasoning
on 6-DoF SpatialBench
no code implementations • 17 Jun 2024 • Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jeremy Liu, Ruiping Wang, Hao Dong
The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects. Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly. However, these methods typically focus on high-level planning corrections using an additional MLLM, with limited utilization of failed samples to correct low-level contact poses which is particularly prone to occur during articulated object manipulation. To address this gap, we propose an Autonomous Interactive Correction (AIC) MLLM, which makes use of previous low-level interaction experiences to correct SE(3) pose predictions for articulated object.
no code implementations • 9 Jun 2024 • Tianyang Xue, Lin Lu, Yang Liu, Mingdong Wu, Hao Dong, Yanbin Zhang, Renmin Han, Baoquan Chen
The difficulty of learning from teacher packing is to capture the complex geometric relationships among packing examples, which include the spatial (position, orientation) relationships of objects, their geometric features, and container boundary conditions.
no code implementations • 7 Jun 2024 • Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, Hao Dong
To reach this goal, we introduce Dynamic Chain-of-Navigation (DCoN) to unify the planning process for different types of navigation instructions.
no code implementations • 6 Jun 2024 • Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong
6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets.
no code implementations • 3 Jun 2024 • Han Sun, Yunkang Cao, Hao Dong, Olga Fink
Visual anomaly detection (AD) presents significant challenges due to the scarcity of anomalous data samples.
1 code implementation • 1 Jun 2024 • Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li
To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation. Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively.
1 code implementation • 27 May 2024 • Hao Dong, Yue Zhao, Eleni Chatzi, Olga Fink
Extensive experiments on MultiOOD demonstrate that training with A2D and NP-Mix improves existing OOD detection algorithms by a large margin.
no code implementations • CVPR 2024 • Ruihai Wu, Haoran Lu, Yiyan Wang, YuBo Wang, Hao Dong
Garment manipulation (e. g., unfolding, folding and hanging clothes) is essential for future robots to accomplish home-assistant tasks, while highly challenging due to the diversity of garment configurations, geometries and deformations.
3 code implementations • CVPR 2024 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao
To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning.
no code implementations • 4 Apr 2024 • Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Guyue Zhou, Yixin Zhu, Hao Dong, Hao Zhao
Robotic manipulation with two-finger grippers is challenged by objects lacking distinct graspable features.
1 code implementation • 27 Mar 2024 • Yuxuan Wan, Kaichen Zhou, jinhong Chen, Hao Dong
To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures.
no code implementations • 13 Mar 2024 • ran Xu, Yan Shen, Xiaoqi Li, Ruihai Wu, Hao Dong
To address these challenges, we introduce a comprehensive benchmark, NrVLM, comprising 15 distinct manipulation tasks, containing over 4500 episodes meticulously annotated with fine-grained language instructions.
no code implementations • 12 Mar 2024 • Hanyu Zhou, Zhiwei Shi, Hao Dong, Shihan Peng, Yi Chang, Luxin Yan
In spatial reasoning stage, we project the compensated events into the same image coordinate, discretize the timestamp of events to obtain a time image that can reflect the motion confidence, and further segment the moving object through adaptive threshold on the time image.
2 code implementations • 7 Feb 2024 • Yuan Tian, Wenqi Zhou, Hao Dong, David S. Kammer, Olga Fink
Our results demonstrate that Sym-Q excels not only in recovering underlying mathematical structures but also uniquely learns to efficiently refine the output expression based on reward signals, thereby discovering underlying expressions.
no code implementations • 26 Dec 2023 • Kaichen Zhou, Lanqing Hong, Xinhai Chang, Yingji Zhong, Enze Xie, Hao Dong, Zhihao LI, Yongxin Yang, Zhenguo Li, Wei zhang
A key challenge in fine-grained 3D-based interactive editing is the absence of an efficient representation that balances diverse modifications with high-quality view synthesis under a given memory constraint.
no code implementations • CVPR 2024 • Xiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong
By fine-tuning the injected adapters, we preserve the inherent common sense and reasoning ability of the MLLMs while equipping them with the ability for manipulation.
1 code implementation • 19 Dec 2023 • Ruiyuan Zhang, Jiaxiang Liu, Zexi Li, Hao Dong, Jie Fu, Chao Wu
Therefore, there is a need to develop a scalable framework for geometric fracture assembly without relying on semantic information.
1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.
no code implementations • 21 Nov 2023 • Yushi Du, Ruihai Wu, Yan Shen, Hao Dong
More importantly, while many methods could only model a certain kind of joint motion (such as the revolution in the clockwise order), our proposed framework is generic to different kinds of joint motions in that transformation matrix can model diverse kinds of joint motions in the space.
1 code implementation • 20 Nov 2023 • Hao Dong, Gaëtan Frusque, Yue Zhao, Eleni Chatzi, Olga Fink
While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised anomaly detection.
1 code implementation • NeurIPS 2023 • Hao Dong, Ismail Nejjar, Han Sun, Eleni Chatzi, Olga Fink
In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions.
no code implementations • 25 Oct 2023 • Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas
Central to SparseDFF is a feature refinement network, optimized with a contrastive loss between views and a point-pruning mechanism for feature continuity.
no code implementations • 18 Oct 2023 • Tianyang Xue, Mingdong Wu, Lin Lu, Haoxuan Wang, Hao Dong, Baoquan Chen
In this work, we delve deeper into a novel machine learning-based approach that formulates the packing problem as conditional generative modeling.
no code implementations • 13 Oct 2023 • Xiaoqi Li, Yanzi Wang, Yan Shen, Ponomarenko Iaroslav, Haoran Lu, Qianxu Wang, Boshi An, Jiaming Liu, Hao Dong
This framework is designed to capture multiple perspectives of the target object and infer depth information to complement its geometry.
no code implementations • 10 Oct 2023 • Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas
However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships.
no code implementations • 20 Sep 2023 • Yuxing Long, Xiaoqi Li, Wenzhe Cai, Hao Dong
The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics.
no code implementations • 19 Sep 2023 • Ziyue Qiao, Xiao Luo, Meng Xiao, Hao Dong, Yuanchun Zhou, Hui Xiong
To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding, thus the node classifier trained on labeled source nodes can be transferred to the target nodes.
GRAPH DOMAIN ADAPTATION
Semi-supervised Domain Adaptation
+2
1 code implementation • NeurIPS 2023 • Hongcheng Wang, Andy Guan Hong Chen, Xiaoqi Li, Mingdong Wu, Hao Dong
The task of Visual Object Navigation (VON) involves an agent's ability to locate a particular object within a given scene.
no code implementations • NeurIPS 2023 • Chuanruo Ning, Ruihai Wu, Haoran Lu, Kaichun Mo, Hao Dong
Our framework explicitly estimates the geometric similarity across different categories, identifying local areas that differ from shapes in the training categories for efficient exploration while concurrently transferring affordance knowledge to similar parts of the objects.
1 code implementation • ICCV 2023 • Ruihai Wu, Chenrui Tie, Yushi Du, Yan Zhao, Hao Dong
Shape assembly aims to reassemble parts (or fragments) into a complete object, which is a common task in our daily life.
no code implementations • 12 Sep 2023 • Tianhao Wu, Mingdong Wu, Jiyao Zhang, Yunchong Gan, Hao Dong
In this paper, we propose a novel task called human-assisting dexterous grasping that aims to train a policy for controlling a robotic hand's fingers to assist users in grasping objects.
1 code implementation • 8 Sep 2023 • Junfeng Cheng, Mingdong Wu, Ruiyuan Zhang, Guanqi Zhan, Chao Wu, Hao Dong
In this paper, we formulate this task from a novel generative perspective, introducing the Score-based 3D Part Assembly framework (Score-PA) for 3D part assembly.
no code implementations • 6 Sep 2023 • Hao Dong, Pengyang Wang, Meng Xiao, Zhiyuan Ning, Pengfei Wang, Yuanchun Zhou
Subsequently, we utilize the defined query-aware temporal paths on a history temporal graph to model historical path information related to queries for reasoning.
1 code implementation • 29 Aug 2023 • Jingbang Chen, Yian Wang, Xingwei Qu, Shuangjia Zheng, Yaodong Yang, Hao Dong, Jie Fu
Molecular dynamics simulations have emerged as a fundamental instrument for studying biomolecules.
1 code implementation • 24 Aug 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao
However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes.
3D Semantic Segmentation
Few-shot 3D semantic segmentation
+1
no code implementations • 18 Jun 2023 • Jiyao Zhang, Mingdong Wu, Hao Dong
Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings.
Ranked #1 on
6D Pose Estimation using RGBD
on REAL275
1 code implementation • 25 May 2023 • Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao
In this paper, we propose MUTR, a Multi-modal Unified Temporal transformer for Referring video object segmentation.
1 code implementation • 18 May 2023 • Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li
This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions to sequential actions for robotic manipulation tasks.
1 code implementation • 4 May 2023 • Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li
Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.
Ranked #2 on
Personalized Segmentation
on PerSeg
1 code implementation • 25 Apr 2023 • Hao Dong, Zhiyuan Ning, Pengyang Wang, Ziyue Qiao, Pengfei Wang, Yuanchun Zhou, Yanjie Fu
Temporal knowledge graph (TKG) reasoning aims to predict the future missing facts based on historical information and has gained increasing research interest recently.
no code implementations • 10 Apr 2023 • Zihan Ding, Yuanpei Chen, Allen Z. Ren, Shixiang Shane Gu, Qianxu Wang, Hao Dong, Chi Jin
Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands.
no code implementations • 29 Mar 2023 • Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu
Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks.
1 code implementation • CVPR 2023 • Haoran Geng, Ziming Li, Yiran Geng, Jiayi Chen, Hao Dong, He Wang
Learning a generalizable object manipulation policy is vital for an embodied agent to work in complex real-world scenes.
no code implementations • ICCV 2023 • Ruihai Wu, Chuanruo Ning, Hao Dong
In this paper, we study deformable object manipulation using dense visual affordance, with generalization towards diverse states, and propose a novel kind of foresightful dense affordance, which avoids local optima by estimating states' values for long-term manipulation.
1 code implementation • 2 Feb 2023 • Sheng Xu, Yanjing Li, Teli Ma, Mingbao Lin, Hao Dong, Baochang Zhang, Peng Gao, Jinhu Lv
In this paper, we introduce a Resilient Binary Neural Network (ReBNN) to mitigate the frequent oscillation for better BNNs' training.
1 code implementation • CVPR 2023 • Hai Ci, Mingdong Wu, Wentao Zhu, Xiaoxuan Ma, Hao Dong, Fangwei Zhong, Yizhou Wang
During the denoising process, GFPose implicitly incorporates pose priors in gradients and unifies various discriminative and generative tasks in an elegant framework.
1 code implementation • 28 Nov 2022 • Hao Dong, Weihao Gu, Xianjing Zhang, Jintao Xu, Rui Ai, Huimin Lu, Juho Kannala, Xieyuanli Chen
However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications.
1 code implementation • 11 Oct 2022 • Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang
A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
1 code implementation • 27 Sep 2022 • Hao Dong, Xieyuanli Chen, Mihai Dusmanu, Viktor Larsson, Marc Pollefeys, Cyrill Stachniss
A distinctive representation of image patches in form of features is a key component of many computer vision and robotics tasks, such as image matching, image retrieval, and visual localization.
1 code implementation • 26 Sep 2022 • Yiran Geng, Boshi An, Haoran Geng, Yuanpei Chen, Yaodong Yang, Hao Dong
Such contact prediction process then leads to an end-to-end affordance learning framework that can generalize over different types of manipulation tasks.
no code implementations • 25 Sep 2022 • Renrui Zhang, Bohao Li, Wei zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao
In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.
no code implementations • 16 Sep 2022 • Meng Xiao, Ziyue Qiao, Yanjie Fu, Hao Dong, Yi Du, Pengyang Wang, Hui Xiong, Yuanchun Zhou
Specifically, we first propose a hierarchical transformer to extract the textual semantic information of proposals.
no code implementations • 13 Sep 2022 • Hao Dong, Yuya Sasaki
Based on the proposed estimator, we construct a formal test on the sub-unity of the marginal propensity to consume out of permanent income (MPCP) under a nonparametric consumption model and a permanent-transitory model of income dynamics with nonparametric distribution.
no code implementations • 2 Sep 2022 • Mingdong Wu, Fangwei Zhong, Yulong Xia, Hao Dong
For object rearrangement, the TarGF can be used in two ways: 1) For model-based planning, we can cast the target gradient into a reference control and output actions with a distributed path planner; 2) For model-free reinforcement learning, the TarGF is not only used for estimating the likelihood-change as a reward but also provides suggested actions in residual policy learning.
1 code implementation • 15 Aug 2022 • Hao Dong, Xieyuanli Chen, Simo Särkkä, Cyrill Stachniss
We further use the extracted poles as pseudo labels to train a deep neural network for online range image-based pole segmentation.
1 code implementation • 7 Aug 2022 • Qiyu Dai, Jiyao Zhang, Qiwei Li, Tianhao Wu, Hao Dong, Ziyuan Liu, Ping Tan, He Wang
Commercial depth sensors usually generate noisy and missing depths, especially on specular and transparent objects, which poses critical issues to downstream depth or point cloud-based tasks.
no code implementations • 2 Aug 2022 • Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang, Yaodong Yang
The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community.
no code implementations • 13 Jul 2022 • Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang
Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks.
1 code implementation • 8 Jul 2022 • Tong Zhang, Peng Gao, Hao Dong, Yin Zhuang, Guanqun Wang, Wei zhang, He Chen
Currently, under supervised learning, a model pretrained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated the knowledge transfer learning.
no code implementations • 5 Jul 2022 • Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, Hao Dong
It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments.
1 code implementation • 17 Jun 2022 • Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Yiran Geng, Hao Dong, Zongqing Lu, Song-Chun Zhu, Yaodong Yang
In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects.
no code implementations • 7 Mar 2022 • Meng Xiao, Ziyue Qiao, Yanjie Fu, Hao Dong, Yi Du, Pengyang Wang, Dong Li, Yuanchun Zhou
After extracting the semantic and interdisciplinary knowledge, we design a level-wise prediction component to fuse the two types of knowledge representations and detect interdisciplinary topic paths for each proposal.
no code implementations • 4 Mar 2022 • Tianhao Wu, Fangwei Zhong, Yiran Geng, Hongchen Wang, Yongjian Zhu, Yizhou Wang, Hao Dong
we formulate the dynamic grasping problem as a 'move-and-grasp' game, where the robot is to pick up the object on the mover and the adversarial mover is to find a path to escape it.
no code implementations • 19 Dec 2021 • Mingxin Yu, Lin Shao, Zhehuan Chen, Tianhao Wu, Qingnan Fan, Kaichun Mo, Hao Dong
Part assembly is a typical but challenging task in robotics, where robots assemble a set of individual parts into a complete shape.
no code implementations • 10 Dec 2021 • Jiahao Huang, Weiping Ding, Jun Lv, Jingwen Yang, Hao Dong, Javier Del Ser, Jun Xia, Tiaojuan Ren, Stephen Wong, Guang Yang
The dual discriminator design aims to improve the edge information in MRI reconstruction.
no code implementations • 1 Dec 2021 • Yian Wang, Ruihai Wu, Kaichun Mo, Jiaqi Ke, Qingnan Fan, Leonidas Guibas, Hao Dong
Perceiving and interacting with 3D articulated objects, such as cabinets, doors, and faucets, pose particular challenges for future home-assistant robots performing daily tasks in human environments.
no code implementations • 29 Sep 2021 • Yuchen Liu, Yali Du, Runji Lin, Hangrui Bi, Mingdong Wu, Jun Wang, Hao Dong
Model-based RL is an effective approach for reducing sample complexity.
Model-based Reinforcement Learning
Reinforcement Learning (RL)
1 code implementation • 26 Aug 2021 • Yixiao Guo, Jiawei Liu, Guo Li, Luo Mai, Hao Dong
When it comes to customising these algorithms for real-world applications, none of the existing libraries can offer both the flexibility of developing custom pose estimation algorithms and the high-performance of executing these algorithms on commodity devices.
1 code implementation • ICCV 2021 • Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas Funkhouser, Li Yi
Another approach is to concatenate all the modalities into a tuple and then contrast positive and negative tuple correspondences.
Ranked #82 on
Semantic Segmentation
on NYU Depth v2
no code implementations • ICLR 2022 • Ruihai Wu, Yan Zhao, Kaichun Mo, Zizheng Guo, Yian Wang, Tianhao Wu, Qingnan Fan, Xuelin Chen, Leonidas Guibas, Hao Dong
In this paper, we propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction-aware, and task-aware visual action affordance and trajectory proposals.
1 code implementation • 19 Apr 2021 • Jie Ren, Yewen Li, Zihan Ding, Wei Pan, Hao Dong
However, grasping distinguishable skills for some tasks with non-unique optima can be essential for further improving its learning efficiency and performance, which may lead to a multimodal policy represented as a mixture-of-experts (MOE).
no code implementations • 29 Mar 2021 • Pan Wang, Zhifeng Gong, Shuo Wang, Hao Dong, Jialu Fan, Ling Li, Peter Childs, Yike Guo
To modify a design semantic of a given product from personalised brain activity via adversarial learning, in this work, we propose a deep generative transformation model to modify product semantics from the brain signal.
no code implementations • 22 Feb 2021 • Zhiyuan Ning, Ziyue Qiao, Hao Dong, Yi Du, Yuanchun Zhou
Knowledge graph embedding (KGE) models learn to project symbolic entities and relations into a continuous vector space based on the observed triplets.
no code implementations • 24 Dec 2020 • Yunze Liu, Li Yi, Shanghang Zhang, Qingnan Fan, Thomas Funkhouser, Hao Dong
Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks.
1 code implementation • 18 Nov 2020 • Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong
In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.
no code implementations • 30 Sep 2020 • Chu-ran Wang, Jing Li, Fandong Zhang, Xinwei Sun, Hao Dong, Yizhou Yu, Yizhou Wang
Mammogram benign or malignant classification with only image-level labels is challenging due to the absence of lesion annotations.
no code implementations • 21 Sep 2020 • Jian Mei, Hao Dong
DongNiao International Birds 10000 (DIB-10K) is a challenging image dataset which has more than 10 thousand different types of birds.
no code implementations • 20 Sep 2020 • Qingrui Zhang, Hao Dong, Wei Pan
More importantly, the existing multi-agent reinforcement learning (MARL) algorithms cannot ensure the closed-loop stability of a multi-agent system from a control-theoretic perspective, so the learned control polices are highly possible to generate abnormal or dangerous behaviors in real applications.
Deep Reinforcement Learning
Multi-agent Reinforcement Learning
+2
1 code implementation • 18 Sep 2020 • Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Guo Li, Quancheng Guo, Luo Mai, Hao Dong
RLzoo provides developers with (i) high-level yet flexible APIs for prototyping DRL agents, and further customising the agents for best performance, (ii) a model zoo where users can import a wide range of DRL agents and easily compare their performance, and (iii) an algorithm that can automatically construct DRL agents with custom components (which are critical to improve agent's performance in custom applications).
3 code implementations • NeurIPS 2020 • Jialei Huang, Guanqi Zhan, Qingnan Fan, Kaichun Mo, Lin Shao, Baoquan Chen, Leonidas Guibas, Hao Dong
Analogous to buying an IKEA furniture, given a set of 3D parts that can assemble a single shape, an intelligent agent needs to perceive the 3D part geometry, reason to propose pose estimations for the input parts, and finally call robotic planning and control routines for actuation.
1 code implementation • ICLR 2020 • Jie Fu, Xue Geng, Zhijian Duan, Bohan Zhuang, Xingdi Yuan, Adam Trischler, Jie Lin, Chris Pal, Hao Dong
To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated.
3 code implementations • ECCV 2020 • Yihao Zhao, Ruihai Wu, Hao Dong
Cycle-consistency loss is a widely used constraint for such problems.
1 code implementation • 22 Nov 2019 • Guanqi Zhan, Yihao Zhao, Bingchan Zhao, Haoqi Yuan, Baoquan Chen, Hao Dong
By mapping the discrete label-specific attribute features into a continuous prior distribution, we leverage the advantages of both discrete labels and reference images to achieve image manipulation in a hybrid fashion.
no code implementations • 11 Jun 2018 • Hao Dong, Shuai Li, Dongchang Xu, Yi Ren, Di Zhang
The training of Deep Neural Networks usually needs tremendous computing resources.
no code implementations • 19 May 2018 • Simiao Yu, Hao Dong, Pan Wang, Chao Wu, Yike Guo
Bionic design refers to an approach of generative creativity in which a target object (e. g. a floor lamp) is designed to contain features of biological source objects (e. g. flowers), resulting in creative biologically-inspired design.
no code implementations • 20 Nov 2017 • Hao Dong, Chao Wu, Zhen Wei, Yike Guo
However, current architecture of deep networks suffers the privacy issue that users need to give out their data to the model (typically hosted in a server or a cluster on Cloud) for training or prediction.
2 code implementations • 26 Jul 2017 • Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo
Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others.
2 code implementations • ICCV 2017 • Hao Dong, Simiao Yu, Chao Wu, Yike Guo
In this paper, we propose a way of synthesizing realistic images directly with natural language description, which has many useful applications, e. g. intelligent image manipulation.
no code implementations • 19 May 2017 • Simiao Yu, Hao Dong, Guang Yang, Greg Slabaugh, Pier Luigi Dragotti, Xujiong Ye, Fangde Liu, Simon Arridge, Jennifer Keegan, David Firmin, Yike Guo
Fast Magnetic Resonance Imaging (MRI) is highly in demand for many clinical applications in order to reduce the scanning cost and improve the patient experience.
no code implementations • 10 May 2017 • Hao Dong, Guang Yang, Fangde Liu, Yuanhan Mo, Yike Guo
In this context, a reliable fully automatic segmentation method for the brain tumor segmentation is necessary for an efficient measurement of the tumor extent.
no code implementations • 20 Mar 2017 • Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo
We demonstrate that %the capability of our method to understand the sentence descriptions, so as to I2T2I can generate better multi-categories images using MSCOCO than the state-of-the-art.
8 code implementations • 12 Mar 2017 • Akara Supratak, Hao Dong, Chao Wu, Yike Guo
This demonstrated that, without changing the model architecture and the training algorithm, our model could automatically learn features for sleep stage scoring from different raw single-channel EEGs from different datasets without utilizing any hand-engineered features.
Ranked #4 on
Sleep Stage Detection
on MASS SS3
no code implementations • 10 Jan 2017 • Hao Dong, Paarth Neekhara, Chao Wu, Yike Guo
It's useful to automatically transform an image from its original form to some synthetic form (style, partial contents, etc.
no code implementations • 15 Oct 2016 • Hao Dong, Akara Supratak, Wei Pan, Chao Wu, Paul M. Matthews, Yike Guo
Use of this recording configuration with neural network deconvolution promises to make clinically indicated home sleep studies practical.
1 code implementation • 23 Jun 2016 • Wei Pan, Hao Dong, Yike Guo
We proposed regularisers which support a simple mechanism of dropping neurons during a network training process.