Object Localization
231 papers with code • 18 benchmarks • 17 datasets
Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.
Source: Fast On-Line Kernel Density Estimation for Active Object Localization
Libraries
Use these libraries to find Object Localization models and implementationsSubtasks
Latest papers
Realistic Model Selection for Weakly Supervised Object Localization
Our experimental results with several WSOL methods on ILSVRC and CUB-200-2011 datasets show that our noisy boxes allow selecting models with performance close to those selected using ground truth boxes, and better than models selected using only image-class labels.
FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery
Object detection in remotely sensed satellite pictures is fundamental in many fields such as biophysical, and environmental monitoring.
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
GPT4V, the best-performing VLM, achieves 62. 99% accuracy (4-shot) on the comprehension task and 49. 7% on the localization task (4-shot and Chain-of-Thought).
Few-shot Object Localization
This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images.
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
The reason for the high-performance of large kernel CNNs in downstream tasks has been attributed to the large effective receptive field (ERF) produced by large size kernels, but this view has not been fully tested.
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
In addition, we design a new benchmark, termed Circular-based Relation Probing Evaluation (CRPE) for comprehensively evaluating the relation comprehension capabilities of MLLMs.
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3.
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Large Vision-Language Models (VLMs) have demonstrated impressive performance on complex tasks involving visual input with natural language instructions.
CPR++: Object Localization via Single Coarse Point Supervision
CPR reduces the semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.
Spatial Structure Constraints for Weakly Supervised Semantic Segmentation
In this paper, we propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.