Search Results for author: Xiangchao Yan

Found 14 papers, 13 papers with code

OmniCaptioner: One Captioner to Rule Them All

1 code implementation9 Apr 2025 Yiting Lu, Jiakang Yuan, Zhen Li, Shitian Zhao, Qi Qin, Xinyue Li, Le Zhuo, Licheng Wen, Dongyang Liu, Yuewen Cao, Xiangchao Yan, Xin Li, Botian Shi, Tao Chen, Zhibo Chen, Lei Bai, Bo Zhang, Peng Gao

We propose OmniCaptioner, a versatile visual captioning framework for generating fine-grained textual descriptions across a wide variety of visual domains.

All Image Captioning +2

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

no code implementations7 Jan 2025 Jiakang Yuan, Xiangchao Yan, Botian Shi, Tao Chen, Wanli Ouyang, Bo Zhang, Lei Bai, Yu Qiao, BoWen Zhou

The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI).

Image Classification

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

2 code implementations16 Dec 2024 Renqiu Xia, Mingsheng Li, Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, Bo Zhang

Given the significant differences between geometric diagram-symbol and natural image-text, we introduce unimodal pre-training to develop a diagram encoder and symbol decoder, enhancing the understanding of geometric images and corpora.

Geometry Problem Solving

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

3 code implementations20 Sep 2023 Renqiu Xia, Haoyang Peng, Hancheng Ye, Mingsheng Li, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan, Bo Zhang

Specifically, StructChart first reformulates the chart data from the tubular form (linearized CSV) to STR, which can friendlily reduce the task gap between chart perception and reasoning.

Ranked #20 on Chart Question Answering on ChartQA (using extra training data)

Chart Question Answering Chart Understanding +4

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations

1 code implementation19 Sep 2023 Xiangchao Yan, Runjian Chen, Bo Zhang, Hancheng Ye, Renqiu Xia, Jiakang Yuan, Hongbin Zhou, Xinyu Cai, Botian Shi, Wenqi Shao, Ping Luo, Yu Qiao, Tao Chen, Junchi Yan

Annotating 3D LiDAR point clouds for perception tasks is fundamental for many applications e. g., autonomous driving, yet it still remains notoriously labor-intensive.

3D Object Detection Autonomous Driving +3

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

2 code implementations11 Sep 2023 Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao

Domain shifts such as sensor type changes and geographical situation variations are prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on the previous domain knowledge can be hardly directly deployed to a new domain without additional costs.

Autonomous Driving Domain Generalization

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

1 code implementation NeurIPS 2023 Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao

It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset, to obtain unified representations that can achieve promising results on different tasks or benchmarks.

Autonomous Driving Point Cloud Pre-training

Cannot find the paper you are looking for? You can Submit a new open access paper.