no code implementations • 9 Dec 2024 • Mingliang Zhai, Cheng Li, Zengyuan Guo, Ningrui Yang, Xiameng Qin, Sanyuan Zhao, Junyu Han, Ji Tao, Yuwei Wu, Yunde Jia
The Multi-modal Large Language Models (MLLMs) with extensive world knowledge have revitalized autonomous driving, particularly in reasoning tasks within perceivable regions.
1 code implementation • 15 Jul 2024 • ChunLiang Li, Wencheng Han, Junbo Yin, Sanyuan Zhao, Jianbing Shen
Concurrent processing of multiple autonomous driving 3D perception tasks within the same spatiotemporal scene poses a significant challenge, in particular due to the computational inefficiencies and feature competition between tasks when using traditional multi-task learning approaches.
Ranked #4 on
3D Lane Detection
on OpenLane
1 code implementation • 5 Feb 2024 • Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue
We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation.
no code implementations • 6 Jun 2023 • Yukun Zhai, Xiaoqiang Zhang, Xiameng Qin, Sanyuan Zhao, Xingping Dong, Jianbing Shen
End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.
no code implementations • 8 Feb 2023 • Jiawei Liu, Xingping Dong, Sanyuan Zhao, Jianbing Shen
To achieve simultaneous detection for both common and rare objects, we propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes.
1 code implementation • 14 Dec 2021 • JianJian Cao, Xiameng Qin, Sanyuan Zhao, Jianbing Shen
In this paper, we focus on these two problems and propose a Graph Matching Attention (GMA) network.
no code implementations • ICCV 2021 • Xin Hao, Sanyuan Zhao, Mang Ye, Jianbing Shen
Cross-modality person re-identification is a challenging task due to large cross-modality discrepancy and intra-modality variations.
no code implementations • CVPR 2020 • Tao Li, Zhiyuan Liang, Sanyuan Zhao, Jiahao Gong, Jianbing Shen
For the global error, we first transform category-wise features into a high-level graph model with coarse-grained structural information, and then decouple the high-level graph to reconstruct the category features.
1 code implementation • ECCV 2018 • Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).
Ranked #1 on
Video Salient Object Detection
on UVSD
(using extra training data)
no code implementations • 28 Jul 2017 • Weilin Cong, Sanyuan Zhao, Hui Tian, Jianbing Shen
Real-world face detection and alignment demand an advanced discriminative model to address challenges by pose, lighting and expression.