Open-vocabulary object detection
65 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Open-vocabulary object detection
Most implemented papers
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.
Open-Vocabulary DETR with Conditional Matching
To this end, we propose a novel open-vocabulary detector based on DETR -- hence the name OV-DETR -- which, once trained, can detect any object given its class name or an exemplar image.
Simple Open-Vocabulary Object Detection with Vision Transformers
Combining simple architectures with large-scale pre-training has led to massive improvements in image classification.
Scaling Open-Vocabulary Object Detection
However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31. 2% to 44. 6% (43% relative improvement).
YOLO-World: Real-Time Open-Vocabulary Object Detection
The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary.
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) - a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection.
Taming Self-Training for Open-Vocabulary Object Detection
This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distribution changes of PLs.
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
We present a new open-vocabulary detection approach based on region-centric image-language pretraining to bridge the gap between image-level pretraining and open-vocabulary object detection.
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities.