no code implementations • 26 Mar 2025 • Yanpeng Sun, Shan Zhang, Wei Tang, Aotian Chen, Piotr Koniusz, Kai Zou, Yuan Xue, Anton Van Den Hengel
Diagrams serve as a fundamental form of visual language, representing complex concepts and their inter-relationships through structured symbols, shapes, and spatial arrangements.
1 code implementation • 11 Jan 2025 • Shan Zhang, Aotian Chen, Yanpeng Sun, Jindong Gu, Yi-Yu Zheng, Piotr Koniusz, Kai Zou, Anton Van Den Hengel, Yuan Xue
Current multimodal large language models (MLLMs) often underperform on mathematical problem-solving tasks that require fine-grained visual understanding.
no code implementations • CVPR 2025 • Shan Zhang, Yao Ni, Jinhao Du, Yuan Xue, Philip Torr, Piotr Koniusz, Anton Van Den Hengel
OWOBJ is a flexible plugin that outperforms baselines in Open-World, Few-Shot, and zero-shot Open-Vocabulary Object Detection.
1 code implementation • 28 Oct 2024 • Xun Guo, Shan Zhang, Yongxin He, Ting Zhang, Wanquan Feng, Haibin Huang, Chongyang Ma
Our method is compatible with a range of text encoders.
no code implementations • 22 Oct 2024 • Xinming Du, Shan Zhang, Eric Zou
We show that in-utero exposure to microplastics, particularly during the third trimester of pregnancy, leads to a significant increase in the likelihood of low birth weight.
1 code implementation • 25 Sep 2024 • Yao Ni, Shan Zhang, Piotr Koniusz
Motivated by this connection, we propose reducing gradient norms for enhanced generalization and aligning fine-tuned model with the pre-trained counterpart to retain knowledge from large-scale pre-training data.
2 code implementations • 5 Jun 2024 • Qiang Chen, Xiangbo Su, Xinyu Zhang, Jian Wang, Jiahui Chen, Yunpeng Shen, Chuchu Han, Ziliang Chen, Weixiang Xu, Fanrong Li, Shan Zhang, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang
In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection.
1 code implementation • CVPR 2024 • Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li
In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model.
1 code implementation • 16 Dec 2023 • Kaiyou Song, Shan Zhang, Tong Wang
In this study, inspired by human beings' way of grasping an image, i. e., focusing on the main object first, we present a semantic-aware autoregressive image modeling (SemAIM) method to tackle this challenge.
1 code implementation • CVPR 2023 • Kaiyou Song, Jin Xie, Shan Zhang, Zimeng Luo
Different from existing SSL-KD methods that transfer knowledge from a static pre-trained teacher to a student, in MOKD, two different models learn collaboratively in a self-supervised manner.
no code implementations • 14 Mar 2023 • Steven Shaw, Kanishka Tyagi, Shan Zhang
Many radar signal processing methodologies are being developed for critical road safety perception tasks.
no code implementations • ICCV 2023 • Jinhao Du, Shan Zhang, Qiang Chen, Haifeng Le, Yanpeng Sun, Yao Ni, Jian Wang, Bin He, Jingdong Wang
To provide precise information for the query image, the prototype is decoupled into task-specific ones, which provide tailored guidance for 'where to look' and 'what to look for', respectively.
no code implementations • ICCV 2023 • Kaiyou Song, Shan Zhang, Zihao An, Zimeng Luo, Tong Wang, Jin Xie
In contrastive self-supervised learning, the common way to learn discriminative representation is to pull different augmented "views" of the same image closer while pushing all other images further apart, which has been proven to be effective.
no code implementations • arXiv 2022 • Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO.
Ranked #8 on
Object Detection
on COCO test-dev
no code implementations • 30 Oct 2022 • Shan Zhang, Naila Murray, Lei Wang, Piotr Koniusz
To address these drawbacks, we propose a Time-rEversed diffusioN tEnsor Transformer (TENET), which i) forms high-order tensor representations that capture multi-way feature occurrences that are highly discriminative, and ii) uses a transformer that dynamically extracts correlations between the query image and the entire support set, instead of a single average-pooled support embedding.
2 code implementations • ICCV 2023 • Qiang Chen, Xiaokang Chen, Jian Wang, Shan Zhang, Kun Yao, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang
Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to one prediction, for end-to-end detection without NMS post-processing.
no code implementations • 20 Jun 2022 • Jinghang Lin, Shan Zhang, Qing Lu
Transfer learning has emerged as a powerful technique in many application problems, such as computer vision and natural language processing.
no code implementations • 27 Apr 2022 • Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo
In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer (CATrans) in a hierarchical architecture.
no code implementations • 17 Mar 2022 • Shan Zhang, Pranay Sharma, Baocheng Geng, Pramod K. Varshney
To achieve greater sensor transmission and estimation efficiency, we propose a two step group-based collaborative distributed estimation scheme, where in the first step, sensors form dependence driven groups such that sensors in the same group are highly dependent, while sensors from different groups are independent, and perform a copula-based maximum a posteriori probability (MAP) estimation via intragroup collaboration.
1 code implementation • 22 Jan 2022 • Junjie Wang, Feng Gao, Junyu Dong, Shan Zhang, Qian Du
Synthetic aperture radar (SAR) image change detection is a vital yet challenging task in the field of remote sensing image analysis.
no code implementations • CVPR 2022 • Shan Zhang, Lei Wang, Naila Murray, Piotr Koniusz
We design a Kernelized Few-shot Object Detector by leveraging kernelized matrices computed over multiple proposal regions, which yield expressive non-linear representations whose model complexity is learned on the fly.
no code implementations • 25 Jun 2018 • Kush R. Varshney, Prashant Khanduri, Pranay Sharma, Shan Zhang, Pramod K. Varshney
Such arguments, however, fail to acknowledge that the overall decision-making system is composed of two entities: the learned model and a human who fuses together model outputs with his or her own information.