no code implementations • 18 Mar 2024 • Ziying Song, Lei Yang, Shaoqing Xu, Lin Liu, Dongyang Xu, Caiyan Jia, Feiyang Jia, Li Wang
Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features.
no code implementations • 31 Jan 2024 • Yongkun Du, Zhineng Chen, Yuchen Su, Caiyan Jia, Yu-Gang Jiang
Multi-modal models have shown appealing performance in visual tasks recently, as instruction-guided training has evoked the ability to understand fine-grained visual content.
no code implementations • 12 Jan 2024 • Ziying Song, Lin Liu, Feiyang Jia, Yadan Luo, Guoxin Zhang, Lei Yang, Li Wang, Caiyan Jia
In the realm of modern autonomous driving, the perception system is indispensable for accurately assessing the state of the surrounding environment, thereby enabling informed prediction and planning.
1 code implementation • 8 Jan 2024 • Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang
To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM.
no code implementations • 5 Jan 2024 • Ziying Song, Guoxin Zhang, Jun Xie, Lin Liu, Caiyan Jia, Shaoqing Xu, Zhepeng Wang
In particular, we propose a voxel-based image pipeline that involves projecting point clouds onto images to obtain both pixel- and patch-level features.
no code implementations • 3 Jan 2024 • Lin Bai, Caiyan Jia, Ziying Song, Chaoqun Cui
Moreover, these methods usually only extract visual features in a basic manner, seldom consider tampering or textual information in images.
no code implementations • ICCV 2023 • Ziying Song, Haiyue Wei, Lin Bai, Lei Yang, Caiyan Jia
Through the projection calibration between the image and point cloud, we project the nearest neighbors of point cloud features onto the image features.
1 code implementation • 23 Jul 2023 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, Yu-Gang Jiang
We first present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception.
Ranked #1 on Scene Text Recognition on CUTE80 (using extra training data)
no code implementations • 20 Mar 2023 • Hongyan Ran, Caiyan Jia
Moreover, we use a cross-attention mechanism on a pair of source data and target data with the same labels to learn domain-invariant representations.
1 code implementation • 28 May 2022 • Yimei Zheng, Caiyan Jia, Jian Yu, Xuanya Li
Under the assumption of consistency for data in different views, the cluster structure of network topology and that of node attributes should be consistent for an attributed network.
2 code implementations • 30 Apr 2022 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang
Dominant scene text recognition models commonly contain two building blocks, a visual model for feature extraction and a sequence model for text transcription.
Ranked #16 on Scene Text Recognition on ICDAR2013
no code implementations • 5 Jun 2021 • Wei Liu, Zhenhai Chang, Caiyan Jia, Yimei Zheng
Exploring meaningful structural regularities embedded in networks is a key to understanding and analyzing the structure and function of a network.
no code implementations • 5 Dec 2018 • Guanyu Li, Pengfei Zhang, Caiyan Jia
Attention mechanism has been proven effective on natural language processing.
Ranked #1 on Natural Language Inference on Quora Question Pairs