no code implementations • 26 Nov 2024 • Guojian Zhan, Qiang Ge, Haoyu Gao, Yuming Yin, Bin Zhao, Shengbo Eben Li
Subsequent to the validation process, we conduct comprehensive simulations comparing our proposed model with both kinematic models and existing dynamic models discretized through the forward Euler method.
no code implementations • 25 Nov 2024 • Zhigang Wang, Yifei Su, Chenhui Li, Dong Wang, Yan Huang, Bin Zhao, Xuelong Li
Open-vocabulary 3D scene understanding is indispensable for embodied agents.
no code implementations • 21 Nov 2024 • Guanzhou Lan, YuQi Yang, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li
Specifically, our method comprises a degradation disentanglement module and a degradation-aware contrastive learning module.
no code implementations • 29 Oct 2024 • Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li
Reconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints.
no code implementations • 16 Oct 2024 • Guanzhou Lan, Qianli Ma, YuQi Yang, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao
In this paper, we identify two primary factors contributing to performance degradation: fitting errors and the inference gap.
no code implementations • 11 Oct 2024 • Yunpeng Gao, Zhigang Wang, Linglin Jing, Dong Wang, Xuelong Li, Bin Zhao
Aerial Vision-and-Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues.
1 code implementation • 25 Sep 2024 • Guanlin Li, Ke Zhang, Ting Wang, Ming Li, Bin Zhao, Xuelong Li
Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements.
1 code implementation • 23 Sep 2024 • Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
Specifically, a Proposal-Execution-Feedback-Adjustment (PEFA) mechanism is designed to decompose and assign actions for individual robots, where a centralized task assigner makes a task planning proposal to decompose the complex task into subtasks, and then assigns subtasks to robot executors.
no code implementations • 18 Sep 2024 • Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li
To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings.
no code implementations • 23 Aug 2024 • Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao
3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics.
no code implementations • 8 Aug 2024 • Ziran Zhang, YuHang Tang, Zhigang Wang, Yueting Chen, Bin Zhao
Infrared imaging and turbulence strength measurements are in widespread demand in many fields.
no code implementations • 6 Aug 2024 • Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li
Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the mechanisms of "how to do".
no code implementations • 26 Jul 2024 • Zhaoqing Chen, Jiawei Sun, Xinyi Ye, Bin Zhao, Xuelong Li, Juergen Czarske
Lensless fiber endomicroscope is an emerging tool for in-vivo microscopic imaging, where quantitative phase imaging (QPI) can be utilized as a label-free method to enhance image contrast.
no code implementations • 15 Jul 2024 • Chunshi Wang, Bin Zhao, Shuxue Ding
Cone beam computed tomography (CBCT) is a common way of diagnosing dental related diseases.
no code implementations • 23 Jun 2024 • Delin Qu, Qizhi Chen, Pingrui Zhang, Xianqiang Gao, Bin Zhao, Dong Wang, Xuelong Li
This paper scales object-level reconstruction to complex scenes, advancing interactive scene reconstruction.
1 code implementation • 22 Jun 2024 • Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li
We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations.
1 code implementation • 1 Jun 2024 • Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li
To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation. Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively.
no code implementations • 30 May 2024 • Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li
In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning.
1 code implementation • 1 May 2024 • Bin Zhao, Chunshi Wang, Shuxue Ding
Semi-supervised learning for medical image segmentation presents a unique challenge of efficiently using limited labeled data while leveraging abundant unlabeled data.
1 code implementation • 30 Apr 2024 • Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen Wang, Xuelong Li
We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing.
no code implementations • 28 Apr 2024 • Zhiyao Zhang, Yunzhou Zhang, Yanmin Wu, Bin Zhao, Xingshuo Wang, Rui Tian
With the emergence of Neural Radiance Fields (NeRF), neural implicit representations have gained widespread applications across various domains, including simultaneous localization and mapping.
7 code implementations • 11 Apr 2024 • Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li
The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers.
no code implementations • CVPR 2024 • Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li
In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels.
no code implementations • 22 Feb 2024 • Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li
In the pre-training stage, we employ a discrete diffusion model with a mask-and-replace diffusion strategy to predict future video tokens in the latent space.
no code implementations • 5 Feb 2024 • Pengfei Han, Fuhua Zhang, Bin Zhao, Xuelong Li
Subsequently, a cross-scale motion structure is presented to estimate and refine intermediate flow maps by the extracted features.
1 code implementation • 1 Feb 2024 • Bin Zhao, Pengfei Han, Xuelong Li
Satellites are capable of capturing high-resolution videos.
no code implementations • CVPR 2024 • Zhaojian Li, Bin Zhao, Yuan Yuan
Visual sounding object localization establishes the correspondence between specific visual objects and sound modalities which provides object-aware guidance to improve binaural generation performance.
1 code implementation • 12 Dec 2023 • Linglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li
The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences.
no code implementations • 12 Dec 2023 • Jiawei Sun, Bin Zhao, Dong Wang, Zhigang Wang, Jie Zhang, Nektarios Koukourakis, Juergen W. Czarske, Xuelong Li
Quantitative phase imaging (QPI) through multi-core fibers (MCFs) has been an emerging in vivo label-free endoscopic imaging modality with minimal invasiveness.
no code implementations • CVPR 2024 • Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li
This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods.
no code implementations • CVPR 2024 • Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li
To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping.
no code implementations • 13 Nov 2023 • Zhaojian Li, Bin Zhao, Yuan Yuan
To this end, a metric to measure the spatial perception of audio is proposed for the first time.
2 code implementations • 6 Nov 2023 • Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li
Generalizable articulated object manipulation is essential for home-assistant robots.
7 code implementations • 4 Oct 2023 • Yiwen Tang, Ray Zhang, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters.
no code implementations • 11 Jul 2023 • Guanzhou Lan, Bin Zhao, Xuelong Li
Targeting the surveillance scenes, we develop a disentangled representation, which is an auxiliary pretext task that separates surveillance scenes into the foreground and background with contrastive learning.
1 code implementation • NeurIPS 2023 • Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.
no code implementations • 28 May 2023 • Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
1 code implementation • 8 May 2023 • Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li
Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill, which serves as an upper bound of the previous MI objective.
no code implementations • CVPR 2023 • Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, Xuelong Li
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image.
1 code implementation • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao
The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks.
1 code implementation • ICCV 2023 • Delin Qu, Yizhen Lao, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li
This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion.
7 code implementations • 29 Mar 2023 • Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.
no code implementations • CVPR 2023 • Yihao Wang, Zhigang Wang, Bin Zhao, Dong Wang, Mulin Chen, Xuelong Li
In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e. g., security.
1 code implementation • CVPR 2023 • Haozhe Si, Bin Zhao, Dong Wang, Yunpeng Gao, Mulin Chen, Zhigang Wang, Xuelong Li
We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world.
no code implementations • ICCV 2023 • Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.
1 code implementation • 5 Aug 2022 • Xuelong Li, Guanlin Li, Bin Zhao
The illumination enhancement branch is adopted to enlighten the low-frequency component with reduced resolution.
no code implementations • 19 Jul 2022 • Shenghua Xu, Xinyue Cai, Bin Zhao, Li Zhang, Hang Xu, Yanwei Fu, xiangyang xue
This is because most of the existing lane detection methods either treat the lane detection as a dense prediction or a detection task, few of them consider the unique topologies (Y-shape, Fork-shape, nearly horizontal lane) of the lane markers, which leads to sub-optimal solution.
3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao
By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.
Ranked #5 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)
no code implementations • 4 Nov 2021 • Mingao Yuan, Bin Zhao, Xiaofeng Zhao
In practice, a network may has censored (or missing) values and it is shown that censored values have non-negligible effect on the structural properties of a network.
no code implementations • 22 Sep 2021 • Bin Zhao, Maoguo Gong, Xuelong Li
To integrate the two kinds of information, they are encoded in a two-stream scheme, and a multimodal fusion mechanism is developed based on the hierarchical transformer.
Ranked #17 on Supervised Video Summarization on TvSum
no code implementations • 17 Sep 2021 • Hailong Ning, Bin Zhao, Zhanxuan Hu, Lang He, Ercheng Pei
Motivated by this, an audio-visual collaborative representation learning method is proposed for the DSP task, which explores the audio modality to better predict the dynamic saliency map by assisting vision modality.
1 code implementation • 19 Jul 2021 • Haopeng Li, Lingbo Liu, Kunlin Yang, Shinan Liu, Junyu Gao, Bin Zhao, Rui Zhang, Jun Hou
Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos.
no code implementations • CVPR 2021 • Tianyi Zhang, Jie Lin, Peng Hu, Bin Zhao, Mohamed M. Sabry Aly
Unlike convolutions which are inherently parallel, the de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized and thus could be the performance bottleneck in convolutional object detection pipelines.
no code implementations • 17 May 2021 • Bin Zhao, Maoguo Gong, Xuelong Li
Motivated by this, we propose to jointly exploit the audio and visual information for the video summarization task, and develop an AudioVisual Recurrent Network (AVRN) to achieve this.
no code implementations • 17 May 2021 • Bin Zhao, Xuelong Li
Specifically, in the flow estimation stage, three edge-aware mechanisms are developed to emphasize the frame edges in estimating flow maps, so that the edge-maps are taken as the auxiliary information to provide more guidance to boost the flow accuracy.
no code implementations • 10 May 2021 • Bin Zhao, Haopeng Li, Xiaoqiang Lu, Xuelong Li
Then, the videos are summarized by exploiting both the local and global dependencies among shots.
no code implementations • 9 Mar 2021 • Xuelong Li, Kai Kou, Bin Zhao
To this end, the generator of Weather GAN is composed of an initial translation module, an attention module and a weather-cue segmentation module.
1 code implementation • ICCV 2021 • Bin Zhao, Goutam Bhat, Martin Danelljan, Luc van Gool, Radu Timofte
This effectively limits the performance and generalization capabilities of existing video segmentation methods.
no code implementations • 3 Aug 2020 • Chao Chai, Pengchong Qiao, Bin Zhao, Huiying Wang, Guohua Liu, Hong Wu, E Mark Haacke, Wen Shen, Chen Cao, Xinchen Ye, Zhiyang Liu, Shuang Xia
Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility mapping (QSM).
no code implementations • 10 Aug 2019 • Bin Zhao, Shuxue Ding, Hong Wu, Guohua Liu, Chen Cao, Song Jin, Zhiyang Liu
By using a large number of weakly labeled subjects and a small number of fully labeled subjects, our proposed method is able to accurately detect and segment the AIS lesions.
no code implementations • 8 Jul 2019 • Wuwei Lan, Yanyan Xu, Bin Zhao
Travel time estimation is a crucial task for not only personal travel scheduling but also city planning.
3 code implementations • 5 Jul 2019 • Junyu. Gao, Wei. Lin, Bin Zhao, Dong Wang, Chenyu Gao, Jun Wen
This technical report attempts to provide efficient and solid kits addressed on the field of crowd counting, which is denoted as Crowd Counting Code Framework (C$^3$F).
no code implementations • 28 Apr 2019 • Bin Zhao, Xuelong. Li, Xiaoqiang Lu
Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened.
no code implementations • 24 Apr 2019 • Xuelong. Li, Bin Zhao, Xiaoqiang Lu
Besides, the property-weights are learned for edited videos and raw videos, respectively.
no code implementations • 24 Apr 2019 • Bin Zhao, Xuelong. Li, Xiaoqiang Lu, Zhigang Wang
To address this problem, we make the first attempt to view weather recognition as a multi-label classification task, i. e., assigning an image more than one labels according to the displayed weather conditions.
no code implementations • 4 Jan 2019 • Xue Geng, Jie Fu, Bin Zhao, Jie Lin, Mohamed M. Sabry Aly, Christopher Pal, Vijay Chandrasekhar
This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage.
no code implementations • CVPR 2018 • Bin Zhao, Xuelong. Li, Xiaoqiang Lu
Although video summarization has achieved great success in recent years, few approaches have realized the influence of video structure on the summarization results.
no code implementations • CVPR 2014 • Bin Zhao, Eric P. Xing
Curse of dimensionality is a practical and challenging problem in image categorization, especially in cases with a large number of classes.
no code implementations • CVPR 2014 • Bin Zhao, Eric P. Xing
With the widespread availability of video cameras, we are facing an ever-growing enormous collection of unedited and unstructured video data.
no code implementations • CVPR 2013 • Bin Zhao, Eric P. Xing
Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands.
no code implementations • NeurIPS 2011 • Bin Zhao, Fei Li, Eric P. Xing
With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available.