no code implementations • 9 Jun 2025 • Xiangyu Guo, Zhanqian Wu, Kaixin Xiong, Ziyang Xu, Lijun Zhou, Gangwei Xu, Shaoqing Xu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wenyu Liu, Xinggang Wang
We present Genesis, a unified framework for joint generation of multi-view driving videos and LiDAR sequences with spatio-temporal and cross-modal consistency.
no code implementations • 9 Jun 2025 • Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wenyu Liu, Xinggang Wang
Recent approaches attempt to address this challenge by leveraging the rich world knowledge of Vision-Language Models (VLMs), but these methods suffer from several limitations: (1) a significant domain gap between the pre-training data of VLMs and real-world driving data, (2) a dimensionality mismatch between the discrete language space and the continuous action space, and (3) imitation learning tends to capture the average behavior present in the dataset, which may be suboptimal even dangerous.
Ranked #9 on
NavSim
on OpenScene
no code implementations • 29 May 2025 • Tianhang Wang, Fan Lu, Sanqing Qu, Guo Yu, Shihang Du, Ya Wu, Yuan Huang, Guang Chen
Existing neural rendering-based urban scene reconstruction methods mainly focus on the Interpolated View Synthesis (IVS) setting that synthesizes from views close to training camera trajectory.
no code implementations • 21 May 2025 • Kangan Qian, Sicong Jiang, Yang Zhong, Ziang Luo, Zilin Huang, Tianze Zhu, Kun Jiang, Mengmeng Yang, Zheng Fu, Jinyu Miao, Yining Shi, He Zhe Lim, Li Liu, Tianbao Zhou, Huang Yu, Yifei Hu, Guang Li, Guang Chen, Hao Ye, Lijun Sun, Diange Yang
Vision-Language Models (VLMs) show promise for autonomous driving, yet their struggle with hallucinations, inefficient reasoning, and limited real-world validation hinders accurate perception and robust step-by-step reasoning.
1 code implementation • 26 Mar 2025 • Dingchen Yang, Bowen Cao, Anran Zhang, Weibo Gu, Winston Hu, Guang Chen
Multi-modal Large Langue Models (MLLMs) often process thousands of visual tokens, which consume a significant portion of the context window and impose a substantial computational burden.
no code implementations • 18 Mar 2025 • Qingyao Xu, Siheng Chen, Guang Chen, Yanfeng Wang, Ya zhang
Traffic scene understanding is essential for intelligent transportation systems and autonomous driving, ensuring safe and efficient vehicle operation.
no code implementations • 14 Mar 2025 • Yixuan Zhang, Qing Chang, Yuxi Wang, Guang Chen, Zhaoxiang Zhang, Junran Peng
Speech-driven 3D facial animation seeks to produce lifelike facial expressions that are synchronized with the speech content and its emotional nuances, finding applications in various multimedia fields.
no code implementations • 17 Feb 2025 • Di wu, Xian Wei, Guang Chen, Hao Shen, Xiangfeng Wang, Wenhao Li, Bo Jin
Embodied multi-agent systems (EMAS) have attracted growing attention for their potential to address complex, real-world challenges in areas such as logistics and robotics.
1 code implementation • 17 Feb 2025 • Jianyi Peng, Fan Lu, Bin Li, Yuan Huang, Sanqing Qu, Guang Chen
Compared to single-modal VPR, this approach benefits from the widespread availability of RGB cameras and the robustness of point clouds in providing accurate spatial geometry and distance information.
1 code implementation • CVPR 2025 • Shihang Du, Sanqing Qu, Tianhang Wang, Xudong Zhang, Yunwei Zhu, Jian Mao, Fan Lu, Qiao Lin, Guang Chen
Extensive experiments on 10 leading collaborative perception models reveal that, while these models perform well under ideal conditions, they are significantly affected by corruptions.
no code implementations • 3 Dec 2024 • ZhiYuan Chen, Fan Lu, Guo Yu, Bin Li, Sanqing Qu, Yuan Huang, Changhong Fu, Guang Chen
Tracking the 6DoF pose of unknown objects in monocular RGB video sequences is crucial for robotic manipulation.
1 code implementation • 8 Nov 2024 • Jianzhao Huang, Hongzhan Lin, Ziyan Liu, Ziyang Luo, Guang Chen, Jing Ma
The proliferation of Internet memes in the age of social media necessitates effective identification of harmful ones.
no code implementations • 27 Sep 2024 • Huizi Yu, Jiayan Zhou, Lingyao Li, Shan Chen, Jack Gallifant, Anye Shi, Xiang Li, Wenyue Hua, Mingyu Jin, Guang Chen, Yang Zhou, Zhao Li, Trisha Gupte, Ming-Li Chen, Zahra Azizi, Yongfeng Zhang, Themistocles L. Assimes, Xin Ma, Danielle S. Bitterman, Lin Lu, Lizhou Fan
Here, we developed AIPatient, an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and the Reasoning Retrieval-Augmented Generation (Reasoning RAG) agentic workflow as the generation backbone.
2 code implementations • 19 Aug 2024 • Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll
Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks.
no code implementations • 18 Aug 2024 • Guitao Chen, Yunshen Wang, Hongye Sun, Guang Chen
Generative language models (LMs) offer numerous advantages but may produce inappropriate or harmful outputs due to the harmful knowledge acquired during pre-training.
1 code implementation • 17 Jul 2024 • Tianpei Zou, Sanqing Qu, Zhijun Li, Alois Knoll, Lianghua He, Guang Chen, Changjun Jiang
HGL comprises three complementary modules from local, global to temporal learning in a bottom-up manner. Technically, we first construct a local geometry learning module for pseudo-label generation.
1 code implementation • 17 Jul 2024 • Hu Cao, Zehua Zhang, Yan Xia, Xinyi Li, Jiahao Xia, Guang Chen, Alois Knoll
The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module.
Ranked #1 on
Object Detection
on PKU-DDD17-Car
no code implementations • 8 Jul 2024 • Weiyi Xue, Zehan Zheng, Fan Lu, Haiyun Wei, Guang Chen, Changjun Jiang
Based on this, we propose Geometry guided Neural LiDAR Fields(GeoNLF), a hybrid framework performing alternately global neural reconstruction and pure geometric pose optimization.
1 code implementation • 17 Jun 2024 • Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma
Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning.
1 code implementation • 15 Jun 2024 • Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen
However, a large portion of videos in real-world applications are edited videos, \textit{e. g.}, users usually cut and add effects/modifications to the raw video before publishing it on social media platforms.
no code implementations • 13 Jun 2024 • Yong Wu, Yang Wang, Sanqing Qu, Zhijun Li, Guang Chen
Our model only needs a few unlabeled images of a target user for the model adaptation.
no code implementations • 27 May 2024 • Tianhang Wang, Fan Lu, Zehan Zheng, Guang Chen, Changjun Jiang
To address above problems, we propose RCDN, a Robust Camera-insensitivity collaborative perception with a novel Dynamic feature-based 3D Neural modeling mechanism.
no code implementations • 4 May 2024 • Guang Chen, Zhicong Sun, Yulong Ding, Shuang-Hua Yang
To comprehensively quantify these risks, we propose a framework that considers both the reachability of a system and the risk distribution of a scenario.
1 code implementation • 1 May 2024 • Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen
Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image.
1 code implementation • 10 Apr 2024 • Fan Lu, Kwan-Yee Lin, Yan Xu, Hongsheng Li, Guang Chen, Changjun Jiang
(2) To handle the unbounded nature of urban scenes, we represent 3D scene with a Scalable Hash Grid structure, incrementally adapting to the growing scale of urban scenes.
1 code implementation • CVPR 2024 • Zehan Zheng, Fan Lu, Weiyi Xue, Guang Chen, Changjun Jiang
In light of this, we propose LiDAR4D, a differentiable LiDAR-only framework for novel space-time LiDAR view synthesis.
1 code implementation • 21 Mar 2024 • Dingchen Yang, Bowen Cao, Guang Chen, Changjun Jiang
Multi-modal Large Language Models (MLLMs) demonstrate remarkable success across various vision-language tasks.
2 code implementations • 21 Mar 2024 • Sanqing Qu, Tianpei Zou, Florian Röhrbein, Cewu Lu, Guang Chen, DaCheng Tao, Changjun Jiang
GLC++ enhances the novel category clustering accuracy of GLC by 4. 3% in open-set scenarios on Office-Home.
1 code implementation • CVPR 2024 • Boyang Peng, Sanqing Qu, Yong Wu, Tianpei Zou, Lianghua He, Alois Knoll, Guang Chen, Changjun Jiang
In this paper, we target a practical setting where only a well-trained source model is available and investigate how we can realize IP protection.
2 code implementations • CVPR 2024 • Sanqing Qu, Tianpei Zou, Lianghua He, Florian Röhrbein, Alois Knoll, Guang Chen, Changjun Jiang
Besides, LEAD is also appealing in that it is complementary to most existing methods.
Ranked #2 on
Universal Domain Adaptation
on VisDA2017
no code implementations • 29 Feb 2024 • Haotian Liu, Sanqing Qu, Fan Lu, Zongtao Bu, Florian Roehrbein, Alois Knoll, Guang Chen
Therefore, existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding.
1 code implementation • 28 Jan 2024 • Liguo Zhou, Yinglei Song, Yichao Gao, Zhou Yu, Michael Sodamin, Hongshen Liu, Liang Ma, Lian Liu, Hao liu, Yang Liu, Haichuan Li, Guang Chen, Alois Knoll
However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and interdisciplinary researchers.
1 code implementation • CVPR 2024 • Jiayi Guan, Li Shen, Ao Zhou, Lusong Li, Han Hu, Xiaodong He, Guang Chen, Changjun Jiang
Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy both cumulative and state-wise costs from offline datasets.
no code implementations • 11 Dec 2023 • Jing Hou, Guang Chen, Ruiqi Zhang, Zhijun Li, Shangding Gu, Changjun Jiang
While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop.
no code implementations • 29 Oct 2023 • Weiyi Xue, Fan Lu, Guang Chen
Specifically, A novel feature consistency enhanced double-soft matching network is introduced to achieve two-stage matching with high flexibility while enlarging the receptive field with high efficiency in a patch-to patch manner, which significantly improves the registration performance.
no code implementations • 16 Oct 2023 • Liangliang Chen, Hongzhan Lin, Jinshan Ma, Guang Chen
Nevertheless, the existing approaches just extract each interest independently for the corresponding sub-sequence while ignoring the global correlation of the entire interaction sequence, which may fail to capture the user's inherent preference for the potential interests generalization and unavoidably make the recommended items homogeneous with the historical behaviors.
2 code implementations • ICCV 2023 • Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang
To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to parameterize the entire scene with such primitives.
1 code implementation • CVPR 2023 • Zehan Zheng, Danni Wu, Ruisi Lu, Fan Lu, Guang Chen, Changjun Jiang
In light of these issues, we present NeuralPCI: an end-to-end 4D spatio-temporal Neural field for 3D Point Cloud Interpolation, which implicitly integrates multi-frame information to handle nonlinear large motions for both indoor and outdoor scenarios.
Ranked #1 on
3D Point Cloud Interpolation
on NL-Drive
1 code implementation • ICCV 2023 • Tianhang Wang, Guang Chen, Kai Chen, Zhengfa Liu, Bo Zhang, Alois Knoll, Changjun Jiang
To verify our algorithm, we conducted experiments on the V2X-Sim and OPV2V datasets.
no code implementations • CVPR 2023 • Xin Gu, Guang Chen, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
Meanwhile, the internal stream is designed to exploit the multi-modality information in videos (e. g., the appearance of video frames, speech transcripts, and video captions) to ensure the quality of caption results.
Ranked #7 on
Video Captioning
on YouCook2
1 code implementation • ICCV 2023 • Haotian Liu, Guang Chen, Sanqing Qu, Yanping Zhang, Zhijun Li, Alois Knoll, Changjun Jiang
In this paper, we argue that temporal continuity is a vital element of event-based optical flow and propose a novel Temporal Motion Aggregation (TMA) approach to unlock its potential.
3 code implementations • CVPR 2023 • Sanqing Qu, Tianpei Zou, Florian Roehrbein, Cewu Lu, Guang Chen, DaCheng Tao, Changjun Jiang
We examine the superiority of our GLC on multiple benchmarks with different category shift scenarios, including partial-set, open-set, and open-partial-set DA.
Ranked #3 on
Universal Domain Adaptation
on VisDA2017
no code implementations • CVPR 2023 • Sanqing Qu, Yingwei Pan, Guang Chen, Ting Yao, Changjun Jiang, Tao Mei
We validate the superiority of our MAD in a variety of single-DG scenarios with different modalities, including recognition on 1D texts, 2D images, 3D point clouds, and semantic segmentation on 2D images.
1 code implementation • 25 Feb 2023 • Jiawei Hou, Qi Chen, Yurong Cheng, Guang Chen, xiangyang xue, Taiping Zeng, Jian Pu
However, there is a lack of underground parking scenario datasets with multiple sensors and well-labeled images that support both SLAM tasks and perception tasks, such as semantic segmentation and parking slot detection.
no code implementations • 25 Feb 2023 • Shangding Gu, Alap Kshirsagar, Yali Du, Guang Chen, Jan Peters, Alois Knoll
Deployment of Reinforcement Learning (RL) algorithms for robotics applications in the real world requires ensuring the safety of the robot and its environment.
1 code implementation • 7 Jul 2022 • Xin Gu, Hanhua Ye, Guang Chen, YuFei Wang, Libo Zhang, Longyin Wen
This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition.
1 code implementation • 20 May 2022 • Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Alois Knoll
To establish a good foundation for future safe RL research, in this paper, we provide a review of safe RL from the perspectives of methods, theories, and applications.
1 code implementation • Findings (NAACL) 2022 • Hongzhan Lin, Jing Ma, Liangliang Chen, Zhiwei Yang, Mingfei Cheng, Guang Chen
Massive false rumors emerging along with breaking news or trending topics severely hinder the truth.
1 code implementation • 6 Apr 2022 • Sanqing Qu, Guang Chen, Jing Zhang, Zhijun Li, wei he, DaCheng Tao
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data, which is a much more practical setting due to the data privacy, security, and transmission issues.
2 code implementations • CVPR 2022 • Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, Guang Chen
Previous advances in object tracking mostly reported on favorable illumination circumstances while neglecting performance at nighttime, which significantly impeded the development of related aerial robot applications.
no code implementations • EMNLP 2021 • Hongzhan Lin, Jing Ma, Mingfei Cheng, Zhiwei Yang, Liangliang Chen, Guang Chen
Rumors are rampant in the era of social media.
1 code implementation • ICCV 2021 • Fan Lu, Guang Chen, Yinlong Liu, Lijun Zhang, Sanqing Qu, Shu Liu, Rongqi Gu
Extensive experiments are conducted on two large-scale outdoor LiDAR point cloud datasets to demonstrate the high accuracy and efficiency of the proposed HRegNet.
no code implementations • 27 May 2021 • Guang Chen
Inheriting from coarse-grained Martini representation of organic molecules and combined with deep learning, DMInet has the potential for more accelerated high throughput screening in drug discovery across a much larger chemical space than that can be explored by physics-based simulations alone.
2 code implementations • 7 Apr 2021 • Sanqing Qu, Guang Chen, Zhijun Li, Lijun Zhang, Fan Lu, Alois Knoll
Traditional methods mainly focus on foreground and background frames separation with only a single attention branch and class activation sequence.
Ranked #5 on
Weakly Supervised Action Localization
on THUMOS14
1 code implementation • 4 Mar 2021 • Guanghan Ning, Guang Chen, Chaowei Tan, Si Luo, Liefeng Bo, Heng Huang
We propose a new offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
1 code implementation • 10 Feb 2021 • Kai Chen, Guang Chen, Dan Xu, Lijun Zhang, Yuyao Huang, Alois Knoll
Although Transformer has made breakthrough success in widespread domains especially in Natural Language Processing (NLP), applying it to time series forecasting is still a great challenge.
no code implementations • 25 Jan 2021 • Hu Cao, Guang Chen, Zhijun Li, Jianjie Lin, Alois Knoll
Extensive experiments on two public grasping datasets, Cornell and Jacquard demonstrate the state-of-the-art performance of our method in balancing accuracy and inference speed.
Ranked #1 on
Robotic Grasping
on Jacquard dataset
1 code implementation • 18 Dec 2020 • Fan Lu, Guang Chen, Sanqing Qu, Zhijun Li, Yinlong Liu, Alois Knoll
Generally, the frame rates of mechanical LiDAR sensors are 10 to 20 Hz, which is much lower than other commonly used sensors like cameras.
no code implementations • 21 Nov 2020 • Fan Lu, Guang Chen, Yinlong Liu, Zhijun Li, Sanqing Qu, Tianpei Zou
3D point clouds accurately model 3D information of surrounding environment and are crucial for intelligent vehicles to perceive the scene.
no code implementations • 16 Nov 2020 • Sanqing Qu, Guang Chen, Dan Xu, Jinhu Dong, Fan Lu, Alois Knoll
At each time step, this sampling strategy first estimates current action progression and then decide what temporal ranges should be used to aggregate the optimal supplementary features.
1 code implementation • NeurIPS 2020 • Fan Lu, Guang Chen, Yinlong Liu, Zhongnan Qu, Alois Knoll
To tackle the information loss of random sampling, we exploit a novel random dilation cluster strategy to enlarge the receptive field of each sampled point and an attention mechanism to aggregate the positions and features of neighbor points.
2 code implementations • 8 Aug 2020 • Can Zhang, Yuexian Zou, Guang Chen, Lei Gan
In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries.
Ranked #2 on
Action Recognition
on Jester (Gesture Recognition)
no code implementations • 27 May 2020 • Guang Chen, Shiwen Shen, Longyin Wen, Si Luo, Liefeng Bo
Existing methods only focused on pig counting using single image, and its accuracy is challenged by several factors, including pig movements, occlusion and overlapping.
1 code implementation • ECCV 2020 • Yuliang Guo, Guang Chen, Peitao Zhao, Weide Zhang, Jinghao Miao, Jingao Wang, Tae Eun Choe
The method, inspired by the latest state-of-the-art 3D-LaneNet, is a unified framework solving image encoding, spatial transform of features and 3D lane prediction in a single network.
Ranked #9 on
3D Lane Detection
on Apollo Synthetic 3D Lane
no code implementations • 10 Mar 2020 • Zhenshan Bing, Claus Meschede, Guang Chen, Alois Knoll, Kai Huang
Building spiking neural networks (SNNs) based on biological synaptic plasticities holds a promising potential for accomplishing fast and energy-efficient computing, which is beneficial to mobile robotic applications.
no code implementations • 8 Mar 2020 • Fangyi Zhu, Jenq-Neng Hwang, Zhanyu Ma, Guang Chen, Jun Guo
Thereafter, we construct a new dataset, providing consistent object-sentence pairs, to facilitate effective cross-modal learning.
no code implementations • 5 Sep 2019 • Dekai Zhu, Jinhu Dong, Zhongcong Xu, Canbo Ye, Yinbai Hu, Hang Su, Zhengfa Liu, Guang Chen
The neuromorphic camera is a brand new vision sensor that has emerged in recent years.
Robotics
no code implementations • 1 Jul 2019 • Alexandre Mahrach, Guang Chen, Nuo Li, Carl van Vreeswijk, David Hansel
Networks with three inhibitory populations and V1-like architecture account for the data in ALM layer 2/3.
1 code implementation • 29 Apr 2019 • Yinlong Liu, Alois Knoll, Guang Chen
Accordingly, we propose a vertical direction estimation method by considering the relationship between the vertical frame and horizontal frames.
no code implementations • 25 Mar 2019 • Yinlong Liu, Xuechen Li, Manning Wang, Guang Chen, Zhijian Song, Alois Knoll
In this paper, we consider pairwise constraints and propose a globally optimal algorithm for solving the absolute pose estimation problem.
no code implementations • 18 Oct 2018 • Peng Sun, Guang Chen, Guerdan Luke, Yi Shang
Experimental results show our proposed loss function with the RetinaNet architecture outperformed other state-of-art object detection models by at least 4. 31 mAP, and RetinaNet by 2. 26 mAP with the same inference speed of RetinaNet.
no code implementations • 6 Dec 2017 • Guang Chen, Shu Liu, Kejia Ren, Zhongnan Qu, Changhong Fu, Gereon Hinz, Alois Knoll
However, the mobile sensing perception brings new challenges for how to efficiently analyze and intelligently interpret the deluge of IoT data in mission- critical services.
no code implementations • 11 Nov 2015 • Guorui Zhou, Guang Chen
Inspired by these algorithms, in this paper, we propose a novel method named Hierarchical Latent Semantic Mapping (HLSM), which automatically generates topics from corpus.
no code implementations • CVPR 2014 • Guang Chen, Jianchao Yang, Hailin Jin, Jonathan Brandt, Eli Shechtman, Aseem Agarwala, Tony X. Han
This paper addresses the large-scale visual font recognition (VFR) problem, which aims at automatic identification of the typeface, weight, and slope of the text in an image or photo without any knowledge of content.
Ranked #1 on
Font Recognition
on VFR-447
no code implementations • CVPR 2013 • Guang Chen, Yuanyuan Ding, Jing Xiao, Tony X. Han
The so-called (1 st -order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector.