1 code implementation • 24 Jun 2024 • Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann Lecun, Saining Xie
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.
2 code implementations • CVPR 2024 • Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li
In this paper, we introduce the first large-scale video prediction model in the autonomous driving discipline.
1 code implementation • CVPR 2024 • Penghao Wu, Saining Xie
However the lack of this visual search mechanism in current multimodal LLMs (MLLMs) hinders their ability to focus on important visual details especially when handling high-resolution and visually crowded images.
1 code implementation • 21 Dec 2023 • Penghao Wu, Saining Xie
However, the lack of this visual search mechanism in current multimodal LLMs (MLLMs) hinders their ability to focus on important visual details, especially when handling high-resolution and visually crowded images.
Ranked #122 on Visual Question Answering on MM-Vet
1 code implementation • 29 Jun 2023 • Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li
The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction.
1 code implementation • CVPR 2023 • Xiaosong Jia, Penghao Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi Yan, Hongyang Li
End-to-end autonomous driving has made impressive progress in recent years.
Ranked #4 on CARLA longest6 on CARLA
1 code implementation • 3 Jan 2023 • Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving.
no code implementations • 5 Oct 2022 • Penghao Wu, Li Niu, Jing Liang, Liqing Zhang
Synthetic images created by image editing operations are prevalent, but the color or illumination inconsistency between the manipulated region and background may make it unrealistic.
no code implementations • 5 Oct 2022 • Penghao Wu, Li Niu, Liqing Zhang
Based on the extracted style features, we also propose a novel style voting module to guide the localization of inharmonious region.
1 code implementation • 30 Sep 2022 • Jing Liang, Li Niu, Penghao Wu, Fengjun Guo, Teng Long
Inharmonious region localization aims to localize the region in a synthetic image which is incompatible with surrounding background.
1 code implementation • 15 Jul 2022 • Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, DaCheng Tao
In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3.
Ranked #7 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)
no code implementations • 16 Jun 2022 • Li Chen, Tutian Tang, Zhitian Cai, Yang Li, Penghao Wu, Hongyang Li, Jianping Shi, Junchi Yan, Yu Qiao
Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design.
1 code implementation • 16 Jun 2022 • Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao
The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step.
Ranked #3 on Autonomous Driving on CARLA Leaderboard
1 code implementation • 30 Apr 2022 • Xiaosong Jia, Penghao Wu, Li Chen, Yu Liu, Hongyang Li, Junchi Yan
Based on these observations, we propose Heterogeneous Driving Graph Transformer (HDGT), a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges.