OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

1 code implementation2 May 2024 Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning.

Autonomous Driving counterfactual +4

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

1 code implementation8 Aug 2023 Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An

This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

1 code implementation ICCV 2023 Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, Xiangyu Zhang

On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67. 6% NDS & 65. 3% AMOTA) with lidar-based methods.

3D Multi-Object Tracking 3D Object Detection +2

Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object Detection

no code implementations11 Dec 2022 Shihao Wang, Xiaohui Jiang, Ying Li

The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence.

3D Object Detection object-detection

Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans

2 code implementations14 Jan 2021 Xin He, Shihao Wang, Xiaowen Chu, Shaohuai Shi, Jiangping Tang, Xin Liu, Chenggang Yan, Jiyong Zhang, Guiguang Ding

The experimental results show that our automatically searched models (CovidNet3D) outperform the baseline human-designed models on the three datasets with tens of times smaller model size and higher accuracy.

Benchmarking Medical Diagnosis +1

Logic-based switching finite-time stabilization with applications in mechanical systems

no code implementations30 Jan 2020 Shiqi Zheng, Shihao Wang, Xiang Chen, Yuanlong Xie

Different from the existing adaptive controllers for structured/parametric uncertainties, a new switching barrier Lyapunov method and supervisory functions are introduced to overcome the obstacles caused by unstructured uncertainties and unknown control directions.

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

no code implementations1 Jan 2020 Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.

Code Generation Model Compression

CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks

no code implementations22 Mar 2017 Xushen Han, Dajiang Zhou, Shihao Wang, Shinji Kimura

Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5. 48 times higher than the state-of-the-art FPGA implementations.

Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks

no code implementations4 Mar 2017 Shihao Wang, Dajiang Zhou, Xushen Han, Takeshi Yoshimura

This achieves a peak throughput of 806. 4GOPS with 567. 5mW and is able to accelerate the five convolutional layers in AlexNet at a frame rate of 326. 2fps.

