TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Counting	FSC147	CLIP-Count	Val MAE	18.79	# 3
Zero-Shot Counting	FSC147	CLIP-Count	Val RMSE	61.18	# 1
Zero-Shot Counting	FSC147	CLIP-Count	Test MAE	17.78	# 3
Zero-Shot Counting	FSC147	CLIP-Count	Test RMSE	106.62	# 3
Cross-Part Crowd Counting	ShanghaiTech A	CLIP-Count	MAE	192.6	# 3
Cross-Part Crowd Counting	ShanghaiTech A	CLIP-Count	RMSE	308.4	# 1
Cross-Part Crowd Counting	ShanghaiTech B	CLIP-Count	MAE	45.7	# 3
Cross-Part Crowd Counting	ShanghaiTech B	CLIP-Count	RMSE	77.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-count-towards-text-guided-zero-shot/zero-shot-counting-on-fsc147)](https://paperswithcode.com/sota/zero-shot-counting-on-fsc147?p=clip-count-towards-text-guided-zero-shot)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-count-towards-text-guided-zero-shot/cross-part-crowd-counting-on-shanghaitech-a)](https://paperswithcode.com/sota/cross-part-crowd-counting-on-shanghaitech-a?p=clip-count-towards-text-guided-zero-shot)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-count-towards-text-guided-zero-shot/cross-part-crowd-counting-on-shanghaitech-b)](https://paperswithcode.com/sota/cross-part-crowd-counting-on-shanghaitech-b?p=clip-count-towards-text-guided-zero-shot)`

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

12 May 2023 · Ruixiang Jiang, Lingbo Liu, Changwen Chen ·

Recent advances in visual-language models have shown remarkable zero-shot text-image matching ability that is transferable to downstream tasks such as object detection and segmentation. Adapting these models for object counting, however, remains a formidable challenge. In this study, we first investigate transferring vision-language models (VLMs) for class-agnostic object counting. Specifically, we propose CLIP-Count, the first end-to-end pipeline that estimates density maps for open-vocabulary objects with text guidance in a zero-shot manner. To align the text embedding with dense visual features, we introduce a patch-text contrastive loss that guides the model to learn informative patch-level visual representations for dense prediction. Moreover, we design a hierarchical patch-text interaction module to propagate semantic information across different resolution levels of visual features. Benefiting from the full exploitation of the rich image-text alignment knowledge of pretrained VLMs, our method effectively generates high-quality density maps for objects-of-interest. Extensive experiments on FSC-147, CARPK, and ShanghaiTech crowd counting datasets demonstrate state-of-the-art accuracy and generalizability of the proposed method. Code is available: https://github.com/songrise/CLIP-Count.

PDF Abstract

Code

Add Remove Mark official

songrise/clip-count official

Tasks

Add Remove

Cross-Part Crowd Counting

Cross-Part Evaluation

Crowd Counting

Object

Object Counting

object-detection

Object Detection

Zero-Shot Counting

Datasets

ShanghaiTech

CARPK

FSC147

Results from the Paper

Edit

Ranked #3 on Cross-Part Crowd Counting on ShanghaiTech A

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Counting	FSC147	CLIP-Count	Val MAE	18.79	# 3	Compare
			Val RMSE	61.18	# 1	Compare
			Test MAE	17.78	# 3	Compare
			Test RMSE	106.62	# 3	Compare
Cross-Part Crowd Counting	ShanghaiTech A	CLIP-Count	MAE	192.6	# 3	Compare
Cross-Part Crowd Counting	ShanghaiTech A	CLIP-Count	RMSE	308.4	# 1	Compare
Cross-Part Crowd Counting	ShanghaiTech B	CLIP-Count	MAE	45.7	# 3	Compare
Cross-Part Crowd Counting	ShanghaiTech B	CLIP-Count	RMSE	77.4	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • CLIP • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove