Search Results for author: Yilun Chen

Found 25 papers, 19 papers with code

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

1 code implementation13 Jun 2024 Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress.

Grounded 3D-LLM with Referent Tokens

1 code implementation16 May 2024 Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang

Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning.

Dense Captioning Language Modelling +3

3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting

no code implementations30 Mar 2024 Xiaoyang Lyu, Yang-tian Sun, Yi-Hua Huang, Xiuzhe Wu, ZiYi Yang, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi

In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS.

3D Reconstruction Surface Reconstruction

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

1 code implementation28 Mar 2024 Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao

However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the domain gap between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the lack of data with comprehensive box-caption pair annotations specifically tailored for outdoor scenes.

3D dense captioning Dense Captioning

More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

no code implementations25 Feb 2024 Zhipeng Ma, Zheyan Tu, Xinhai Chen, Yan Zhang, Deguo Xia, Guyue Zhou, Yilun Chen, Yu Zheng, Jiangtao Gong

The experimental results demonstrate that JGRM outperforms existing methods in both road segment representation and trajectory representation tasks.

Representation Learning

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

2 code implementations23 Feb 2024 Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint.

3D Object Detection Autonomous Driving +2

PointLLM: Empowering Large Language Models to Understand Point Clouds

3 code implementations31 Aug 2023 Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

3D Object Captioning 3D Question Answering (3D-QA) +3

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

1 code implementation8 Aug 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +3

VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection

2 code implementations20 Mar 2023 Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang

In autonomous driving, Vehicle-Infrastructure Cooperative 3D Object Detection (VIC3D) makes use of multi-view cameras from both vehicles and traffic infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint.

3D Object Detection Autonomous Driving +2

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

1 code implementation ICCV 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +3

EfficientNeRF: Efficient Neural Radiance Fields

1 code implementation2 Jun 2022 Tao Hu, Shu Liu, Yilun Chen, Tiancheng Shen, Jiaya Jia

Neural Radiance Fields (NeRF) has been wildly applied to various tasks for its high-quality representation of 3D scenes.

valid

Unifying Voxel-based Representation with Transformer for 3D Object Detection

1 code implementation1 Jun 2022 Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia

To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space.

3D Object Detection Decoder +4

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

1 code implementation6 Apr 2022 Yilun Chen, Shijia Huang, Shu Liu, Bei Yu, Jiaya Jia

First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features.

3D Object Detection From Stereo Images Relation

Multi-View Transformer for 3D Visual Grounding

1 code implementation CVPR 2022 Shijia Huang, Yilun Chen, Jiaya Jia, LiWei Wang

The multi-view space enables the network to learn a more robust multi-modal representation for 3D visual grounding and eliminates the dependence on specific views.

3D visual grounding

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

1 code implementation CVPR 2022 Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu, Chiew-Lan Tai

The attention mechanism of the transformer enables our model to adaptively determine where and what information should be taken from the image, leading to a robust and effective fusion strategy.

3D Object Detection Autonomous Driving +3

EfficientNeRF Efficient Neural Radiance Fields

no code implementations CVPR 2022 Tao Hu, Shu Liu, Yilun Chen, Tiancheng Shen, Jiaya Jia

Neural Radiance Fields (NeRF) has been wildly applied to various tasks for its high-quality representation of 3D scenes.

valid

RoadMap: A Light-Weight Semantic Map for Visual Localization towards Autonomous Driving

1 code implementation4 Jun 2021 Tong Qin, Yuxin Zheng, Tongqing Chen, Yilun Chen, Qing Su

Finally, the semantic map is compressed and distributed to production cars, which use this map for localization.

Autonomous Driving Visual Localization

AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot

2 code implementations3 Jul 2020 Tong Qin, Tongqing Chen, Yilun Chen, Qing Su

In this paper, we exploit robust semantic features to build the map and localize vehicles in parking lots.

Autonomous Vehicles Navigate

DSGN: Deep Stereo Geometry Network for 3D Object Detection

1 code implementation CVPR 2020 Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia

Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors because there is a large performance gap between image-based and LiDAR-based methods.

3D Object Detection From Stereo Images Object +2

Fast Point R-CNN

no code implementations ICCV 2019 Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia

We present a unified, efficient and effective framework for point-cloud based 3D object detection.

3D Object Detection object-detection

Learning On-Road Visual Control for Self-Driving Vehicles with Auxiliary Tasks

no code implementations19 Dec 2018 Yilun Chen, Praveen Palanisamy, Priyantha Mudalige, Katharina Muelling, John M. Dolan

In this paper, we leverage auxiliary information aside from raw images and design a novel network structure, called Auxiliary Task Network (ATN), to help boost the driving performance while maintaining the advantage of minimal training data and an End-to-End training method.

Optical Flow Estimation Semantic Segmentation +2

Cascaded Pyramid Network for Multi-Person Pose Estimation

5 code implementations CVPR 2018 Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun

In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints.

Keypoint Detection Multi-Person Pose Estimation

Revealing social networks of spammers through spectral clustering

no code implementations30 Apr 2013 Kevin S. Xu, Mark Kliger, Yilun Chen, Peter J. Woolf, Alfred O. Hero III

To date, most studies on spam have focused only on the spamming phase of the spam cycle and have ignored the harvesting phase, which consists of the mass acquisition of email addresses.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.