Search Results for author: Chenfeng Xu

Found 31 papers, 17 papers with code

Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

no code implementations29 May 2024 Ido Sobol, Chenfeng Xu, Or Litany

Generating realistic images from arbitrary views based on a single source image remains a significant challenge in computer vision, with broad applications ranging from e-commerce to immersive virtual experiences.

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

1 code implementation24 May 2024 Feng Liang, Akio Kodaira, Chenfeng Xu, Masayoshi Tomizuka, Kurt Keutzer, Diana Marculescu

This paper introduces StreamV2V, a diffusion model that achieves real-time streaming video-to-video (V2V) translation with user prompts.

NeRF in Robotics: A Survey

no code implementations2 May 2024 Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields.

Q-SLAM: Quadric Representations for Monocular SLAM

no code implementations12 Mar 2024 Chensheng Peng, Chenfeng Xu, Yue Wang, Mingyu Ding, Heng Yang, Masayoshi Tomizuka, Kurt Keutzer, Marco Pavone, Wei Zhan

This focus results in a significant disconnect between NeRF applications, i. e., novel-view synthesis and the requirements of SLAM.

3D Reconstruction Depth Estimation +2

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

no code implementations7 Nov 2023 Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany

However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.

3D Object Detection Novel View Synthesis +2

Human-oriented Representation Learning for Robotic Manipulation

no code implementations4 Oct 2023 Mingxiao Huo, Mingyu Ding, Chenfeng Xu, Thomas Tian, Xinghao Zhu, Yao Mu, Lingfeng Sun, Masayoshi Tomizuka, Wei Zhan

We introduce Task Fusion Decoder as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks.

Decoder Hand Detection +2

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

no code implementations4 Oct 2023 Hao Sha, Yao Mu, YuXuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding

Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability.

Autonomous Driving Decision Making

RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

no code implementations3 Oct 2023 Tong Zhao, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Yintao Wei

This paper addresses the growing demands for safety and comfort in intelligent robot systems, particularly autonomous vehicles, where road conditions play a pivotal role in overall driving performance.

Autonomous Driving Depth Estimation +3

HallE-Control: Controlling Object Hallucination in Large Multimodal Models

2 code implementations3 Oct 2023 Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li

Current Large Multimodal Models (LMMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning.

Attribute Decoder +3

DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds

1 code implementation ICCV 2023 Chensheng Peng, Guangming Wang, Xian Wan Lo, Xinrui Wu, Chenfeng Xu, Masayoshi Tomizuka, Wei Zhan, Hesheng Wang

Previous methods rarely predict scene flow from the entire point clouds of the scene with one-time inference due to the memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation.

Scene Flow Estimation

Quadric Representations for LiDAR Odometry, Mapping and Localization

no code implementations27 Apr 2023 Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks.

Autonomous Driving

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

1 code implementation CVPR 2023 Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting.

3D Object Detection 3D Open-Vocabulary Object Detection +3

Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection

1 code implementation5 Oct 2022 Jinhyung Park, Chenfeng Xu, Shijia Yang, Kurt Keutzer, Kris Kitani, Masayoshi Tomizuka, Wei Zhan

While recent camera-only 3D detection methods leverage multiple timesteps, the limited history they use significantly hampers the extent to which temporal fusion can improve object perception.

3D Object Detection object-detection +2

Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning

no code implementations5 Jul 2022 Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

Current point-cloud detection methods have difficulty detecting the open-vocabulary objects in the real world, due to their limited generalization capability.

Cloud Detection Contrastive Learning

PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map

1 code implementation21 Apr 2022 Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer, Masayoshi Tomizuka, Alireza Fathi, Wei Zhan

It is hard to replicate these approaches in trajectory forecasting due to the lack of adequate trajectory data (e. g., 34K samples in the nuScenes dataset).

Contrastive Learning Representation Learning +1

DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection

1 code implementation17 Mar 2022 Jinhyung Park, Chenfeng Xu, Yiyang Zhou, Masayoshi Tomizuka, Wei Zhan

While numerous 3D detection works leverage the complementary relationship between RGB images and point clouds, developments in the broader framework of semi-supervised object recognition remain uninfluenced by multi-modal fusion.

object-detection Object Detection +2

DCANet: Dense Context-Aware Network for Semantic Segmentation

no code implementations6 Apr 2021 Yifu Liu, Chenfeng Xu, Xinyu Jin

As the superiority of context information gradually manifests in advanced semantic segmentation, learning to capture the compact context relationship can help to understand the complex scenes.

Segmentation Semantic Segmentation

A Simple and Efficient Multi-task Network for 3D Object Detection and Road Understanding

1 code implementation6 Mar 2021 Di Feng, Yiyang Zhou, Chenfeng Xu, Masayoshi Tomizuka, Wei Zhan

Detecting dynamic objects and predicting static road information such as drivable areas and ground heights are crucial for safe autonomous driving.

3D Object Detection Autonomous Driving +2

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

6 code implementations CVPR 2021 Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei LI, Zehuan Yuan, Changhu Wang, Ping Luo

In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location.

Object object-detection +2

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

8 code implementations5 Jun 2020 Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

In this work, we challenge this paradigm by (a) representing images as semantic visual tokens and (b) running transformers to densely model token relationships.

General Classification Image Classification +1

SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation

3 code implementations ECCV 2020 Chenfeng Xu, Bichen Wu, Zining Wang, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

Using standard convolutions to process such LiDAR images is problematic, as convolution filters pick up local features that are only active in specific regions in the image.

3D Semantic Segmentation Point Cloud Segmentation +1

AutoScale: Learning to Scale for Crowd Counting and Localization

2 code implementations20 Dec 2019 Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka

A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.

Crowd Counting Model Optimization

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

no code implementations ICCV 2019 Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.