Search Results for author: Botian Shi

Found 45 papers, 35 papers with code

LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking

1 code implementation14 Jan 2025 Yukai Ma, Tiantian Wei, Naiting Zhong, Jianbiao Mei, Tao Hu, Licheng Wen, Xuemeng Yang, Botian Shi, Yong liu

The system consists of an Analytic Process (System-II) that accumulates driving experience through logical reasoning and a Heuristic Process (System-I) that refines this knowledge via fine-tuning and few-shot learning.

Autonomous Driving Decision Making +3

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

no code implementations7 Jan 2025 Jiakang Yuan, Xiangchao Yan, Botian Shi, Tao Chen, Wanli Ouyang, Bo Zhang, Lei Bai, Yu Qiao, BoWen Zhou

The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI).

Image Classification

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

2 code implementations16 Dec 2024 Renqiu Xia, Mingsheng Li, Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, Bo Zhang

Given the significant differences between geometric diagram-symbol and natural image-text, we introduce unimodal pre-training to develop a diagram encoder and symbol decoder, enhancing the understanding of geometric images and corpora.

Geometry Problem Solving

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

1 code implementation10 Dec 2024 Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, Jin Shi, Fan Wu, Pei Chu, Minghao Liu, Zhenxiang Li, Chao Xu, Bo Zhang, Botian Shi, Zhongying Tu, Conghui He

Document content extraction is crucial in computer vision, especially for meeting the high-quality data needs of large language models (LLMs) and retrieval-augmented generation (RAG) technologies.

Attribute Benchmarking +2

Chimera: Improving Generalist Model with Domain-Specific Experts

no code implementations8 Dec 2024 Tianshuo Peng, Mingsheng Li, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Conghui He, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang, Xiangyu Yue

This results in a versatile model that excels across the chart, table, math, and document domains, achieving state-of-the-art performance on multi-modal reasoning and visual content extraction tasks, both of which are challenging tasks for assessing existing LMMs.

Math

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

1 code implementation13 Oct 2024 Hancheng Ye, Jiakang Yuan, Renqiu Xia, Xiangchao Yan, Tao Chen, Junchi Yan, Botian Shi, Bo Zhang

Diffusion models have recently achieved great success in the synthesis of high-quality images and videos.

Denoising

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

1 code implementation6 Sep 2024 Jianbiao Mei, Xuemeng Yang, Licheng Wen, Tao Hu, Yu Yang, Tiantian Wei, Yukai Ma, Min Dou, Botian Shi, Yong liu

Recent advances in diffusion models have improved controllable streetscape generation and supported downstream perception and planning tasks.

Video Generation

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

no code implementations23 Jul 2024 Yukai Ma, Jianbiao Mei, Xuemeng Yang, Licheng Wen, Weihua Xu, Jiangning Zhang, Botian Shi, Yong liu, Xingxing Zuo

In this paper, we focus on the potential of 3D radar in semantic scene completion, pioneering cross-modal refinement techniques for improved robustness against weather and illumination changes, and enhancing SSC performance. Regarding model architecture, we propose a three-stage tight fusion approach on BEV to realize a fusion framework for point clouds and images.

Autonomous Driving

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

1 code implementation24 May 2024 Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong liu, Yu Qiao

Experiments also demonstrate that as the memory bank expands, the Heuristic Process with only 1. 8B parameters can inherit the knowledge from a GPT-4 powered Analytic Process and achieve continuous performance improvement.

Autonomous Driving Decision Making +1

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

1 code implementation6 May 2024 Zheng Zhu, XiaoFeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.

Autonomous Driving Decision Making +2

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

1 code implementation23 Apr 2024 Bin Wang, Zhuangcheng Gu, Guang Liang, Chao Xu, Bo Zhang, Botian Shi, Conghui He

To better utilize the UniMER dataset, the paper proposes a Universal Mathematical Expression Recognition Network (UniMERNet), tailored to the characteristics of formula recognition.

Decoder Diversity +1

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

1 code implementation6 Feb 2024 Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, Xing Gao, Xinyu Cai, Bo Zhang, Xuemeng Yang, Yeqi Bai, Hongbin Zhou, Botian Shi

With the development of implicit rendering technology and in-depth research on using generative models to produce data at scale, we propose OASim, an open and adaptive simulator and autonomous driving data generator based on implicit neural rendering.

Autonomous Driving Neural Rendering +1

Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator

1 code implementation20 Dec 2023 Donglin Yang, Zhenfeng Liu, Wentao Jiang, Guohang Yan, Xing Gao, Botian Shi, Si Liu, Xinyu Cai

To this end, we propose a simulator-based physical modeling approach to augment LiDAR data in rainy weather in order to improve the perception performance of LiDAR in this scenario.

Data Augmentation object-detection +1

SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent Diffusion Models

no code implementations27 Nov 2023 Zhiming Guo, Xing Gao, Jianlan Zhou, Xinyu Cai, Botian Shi

In this paper, we propose a novel framework based on diffusion models, called SceneDM, to generate joint and consistent future motions of all the agents, including vehicles, bicycles, pedestrians, etc., in a scene.

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

1 code implementation9 Nov 2023 Licheng Wen, Xuemeng Yang, Daocheng Fu, XiaoFeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving.

Autonomous Driving Common Sense Reasoning +6

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

2 code implementations28 Sep 2023 Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao

Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability.

10-shot image generation 1 Image, 2*2 Stitchi +3

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

3 code implementations20 Sep 2023 Renqiu Xia, Haoyang Peng, Hancheng Ye, Mingsheng Li, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan, Bo Zhang

Specifically, StructChart first reformulates the chart data from the tubular form (linearized CSV) to STR, which can friendlily reduce the task gap between chart perception and reasoning.

Ranked #20 on Chart Question Answering on ChartQA (using extra training data)

Chart Question Answering Chart Understanding +4

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations

1 code implementation19 Sep 2023 Xiangchao Yan, Runjian Chen, Bo Zhang, Hancheng Ye, Renqiu Xia, Jiakang Yuan, Hongbin Zhou, Xinyu Cai, Botian Shi, Wenqi Shao, Ping Luo, Yu Qiao, Tao Chen, Junchi Yan

Annotating 3D LiDAR point clouds for perception tasks is fundamental for many applications e. g., autonomous driving, yet it still remains notoriously labor-intensive.

3D Object Detection Autonomous Driving +3

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

2 code implementations11 Sep 2023 Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao

Domain shifts such as sensor type changes and geographical situation variations are prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on the previous domain knowledge can be hardly directly deployed to a new domain without additional costs.

Autonomous Driving Domain Generalization

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

1 code implementation14 Jul 2023 Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao

In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios.

Autonomous Driving Common Sense Reasoning +4

StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views

1 code implementation8 Jun 2023 Jianfei Guo, Nianchen Deng, Xinyang Li, Yeqi Bai, Botian Shi, Chiyu Wang, Chenjing Ding, Dongliang Wang, Yikang Li

We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data.

Autonomous Driving Neural Rendering +2

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

1 code implementation NeurIPS 2023 Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao

It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset, to obtain unified representations that can achieve promising results on different tasks or benchmarks.

Autonomous Driving Point Cloud Pre-training

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

2 code implementations16 May 2023 Siyuan Huang, Bo Zhang, Botian Shi, Peng Gao, Yikang Li, Hongsheng Li

In this paper, different from previous 2D DG works, we focus on the 3D DG problem and propose a Single-dataset Unified Generalization (SUG) framework that only leverages a single source dataset to alleviate the unforeseen domain differences faced by a well-trained source model.

3D Point Cloud Classification Domain Generalization +2

UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline

1 code implementation20 Dec 2022 Ben Fei, Siyuan Huang, Jiakang Yuan, Botian Shi, Bo Zhang, Weidong Yang, Min Dou, Yikang Li

Different from previous studies that only focus on a single adaptation task, UniDA3D can tackle several adaptation tasks in 3D segmentation field, by designing a unified source-and-target active sampling strategy, which selects a maximally-informative subset from both source and target domains for effective model adaptation.

3D Semantic Segmentation Domain Generalization +2

LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous Driving

1 code implementation7 Dec 2022 Xiang Li, Junbo Yin, Botian Shi, Yikang Li, Ruigang Yang, Jianbing Shen

In this paper, we present a more artful framework, LiDAR-guided Weakly Supervised Instance Segmentation (LWSIS), which leverages the off-the-shelf 3D data, i. e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.

Autonomous Driving Instance Segmentation +5

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

no code implementations18 Oct 2022 Xin Li, Botian Shi, Yuenan Hou, Xingjiao Wu, Tianlong Ma, Yikang Li, Liang He

To address these problems, we construct the homogeneous structure between the point cloud and images to avoid projective information loss by transforming the camera features into the LiDAR 3D space.

3D Object Detection Autonomous Driving +1

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey

no code implementations6 Feb 2022 Keli Huang, Botian Shi, Xiang Li, Xin Li, Siyuan Huang, Yikang Li

Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers.

Autonomous Driving object-detection +4

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

3 code implementations15 Feb 2020 Huaishao Luo, Lei Ji, Botian Shi, Haoyang Huang, Nan Duan, Tianrui Li, Jason Li, Taroon Bharti, Ming Zhou

However, most of the existing multimodal models are pre-trained for understanding tasks, leading to a pretrain-finetune discrepancy for generation tasks.

Ranked #2 on Action Segmentation on COIN (using extra training data)

Action Segmentation Decoder +4

Knowledge Aware Semantic Concept Expansion for Image-Text Matching

no code implementations International Joint Conferences on Artifical Intelligence (IJCAI) 2019 Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan

In this paper, we develop a Scene Concept Graph (SCG) by aggregating image scene graphs and extracting frequently co-occurred concept pairs as scene common-sense knowledge.

Common Sense Reasoning Content-Based Image Retrieval +4

Dense Procedure Captioning in Narrated Instructional Videos

no code implementations ACL 2019 Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu, Ming Zhou

Understanding narrated instructional videos is important for both research and real-world web applications.

Dense Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.