Search Results for author: Bencheng Liao

Found 19 papers, 17 papers with code

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling

1 code implementation17 Mar 2025 Yingyue Li, Bencheng Liao, Wenyu Liu, Xinggang Wang

In this work, we present a hybrid model MaTVLM by substituting a portion of the transformer decoder layers in a pre-trained VLM with Mamba-2 layers.

Language Modeling Language Modelling +1

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

1 code implementation11 Mar 2025 Jialv Zou, Bencheng Liao, Qian Zhang, Wenyu Liu, Xinggang Wang

The model fully leverages Mamba-2's high computational and memory efficiency, extending its capabilities from text generation to multimodal generation.

Mamba multimodal generation +2

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

no code implementations18 Feb 2025 Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang

By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error.

3DGS Autonomous Driving +2

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

1 code implementation18 Feb 2025 Bencheng Liao, Hongyuan Tao, Qian Zhang, Tianheng Cheng, Yingyue Li, Haoran Yin, Wenyu Liu, Xinggang Wang

We propose an seeding strategy to carve Mamba from trained Transformer and a three-stage distillation recipe, which can effectively transfer the knowledge from Transformer to Mamba while preserving multimodal capabilities.

Decoder Mamba +1

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

1 code implementation22 Nov 2024 Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, Xinggang Wang

However, the numerous denoising steps in the robotic diffusion policy and the more dynamic, open-world nature of traffic scenes pose substantial challenges for generating diverse driving actions at a real-time speed.

Autonomous Driving Denoising

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

1 code implementation28 May 2024 Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang

In this paper, we aim to incorporate the sub-quadratic modeling capability of Gated Linear Attention (GLA) into the 2D diffusion backbone.

Mamba

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

1 code implementation28 May 2024 Bencheng Liao, Xinggang Wang, Lianghui Zhu, Qian Zhang, Chang Huang

Recently, linear complexity sequence modeling networks have achieved modeling capabilities similar to Vision Transformers on a variety of computer vision tasks, while using fewer FLOPs and less memory.

Representation Learning

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning

1 code implementation13 Mar 2024 Jialv Zou, Bencheng Liao, Qian Zhang, Wenyu Liu, Xinggang Wang

Learning robust and scalable visual representations from massive multi-view video data remains a challenge in computer vision and autonomous driving.

3D Object Detection Autonomous Driving +3

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

1 code implementation20 Feb 2024 Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging.

Autonomous Driving

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

15 code implementations17 Jan 2024 Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.

Image Classification Mamba +6

MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction

2 code implementations10 Aug 2023 Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process.

Autonomous Driving Online Vectorized HD Map Construction

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

1 code implementation19 Apr 2023 Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene.

Autonomous Driving

VAD: Vectorized Scene Representation for Efficient Autonomous Driving

2 code implementations ICCV 2023 Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as a fully vectorized representation.

Bench2Drive Trajectory Planning

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

1 code implementation15 Mar 2023 Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

Motivated by this, we propose to model the lane graph in a novel path-wise manner, which well preserves the continuity of the lane and encodes traffic information for planning.

Autonomous Driving graph construction +1

MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

1 code implementation30 Aug 2022 Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang

High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system.

3D Lane Detection Autonomous Driving +1

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

2 code implementations NeurIPS 2021 Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu

Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure?

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.