Object Localization

231 papers with code • 18 benchmarks • 17 datasets

Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.

Source: Fast On-Line Kernel Density Estimation for Active Object Localization

Benchmarks

Add a Result

These leaderboards are used to track progress in Object Localization

Dataset	Best Model	Compare
IllusionVQA	GPT4-Vision 4-shot+CoT	See all
KITTI Pedestrians Moderate	Frustrum-PointPillars	See all
KITTI Pedestrians Hard	Frustrum-PointPillars	See all
GRIT	Unified-IOXL	See all
KITTI Cars Easy	VoxelNet	See all
KITTI Cars Moderate	Frustum PointNets	See all
KITTI Cars Hard	VoxelNet	See all
KITTI Pedestrians Easy	Frustum PointNets	See all
KITTI Cyclists Easy	Frustum PointNets	See all
KITTI Cyclists Moderate	Frustum PointNets	See all
KITTI Cyclists Hard	Frustum PointNets	See all
Mall	Hausdorff Loss	See all
Pupil	Hausdorff Loss	See all
Plant	Hausdorff Loss	See all
PASCAL VOC 2007	DeepCut	See all
PASCAL VOC 2012	DeepCut	See all
KITTI Pedestrian Easy	Frustrum-PointPillars	See all
REVERIE	CoLabBUAA_MiNLP	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Object Localization models and implementations

PaddlePaddle/PaddleDetection

3 papers

12,048

jacobgil/pytorch-grad-cam

3 papers

9,412

Westlake-AI/openmixup

2 papers

568

kargarisaac/PointNet-SemSeg-VKITTI3D

2 papers

See all 6 libraries.

Datasets

Subtasks

Monocular 3D Object Localization

Active Object Localization

Latest papers

Most implemented Social Latest No code

Realistic Model Selection for Weakly Supervised Object Localization

shakeebmurtaza/wsol_model_selection • • 15 Apr 2024

Our experimental results with several WSOL methods on ILSVRC and CUB-200-2011 datasets show that our noisy boxes allow selecting models with performance close to those selected using ground truth boxes, and better than models selected using only image-class labels.

15 Apr 2024

Paper
Code

FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery

toelt-llc/FlightScope_Bench • 3 Apr 2024

Object detection in remotely sensed satellite pictures is fundamental in many fields such as biophysical, and environmental monitoring.

03 Apr 2024

Paper
Code

IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models

csebuetnlp/illusionvqa • • 23 Mar 2024

GPT4V, the best-performing VLM, achieves 62. 99% accuracy (4-shot) on the comprehension task and 49. 7% on the localization task (4-shot and Chain-of-Thought).

23 Mar 2024

Paper
Code

Few-shot Object Localization

ryh1218/fsol • • 19 Mar 2024

This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images.

19 Mar 2024

Paper
Code

CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

snskysk/cam-back-again • • 11 Mar 2024

The reason for the high-performance of large kernel CNNs in downstream tasks has been attributed to the large effective receptive field (ERF) produced by large size kernels, but this view has not been fully tested.

11 Mar 2024

Paper
Code

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

opengvlab/all-seeing • • 29 Feb 2024

In addition, we design a new benchmark, termed Circular-based Relation Probing Evaluation (CRPE) for comprehensively evaluating the relation comprehension capabilities of MLLMs.

369

29 Feb 2024

Paper
Code

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

tri-ml/prismatic-vlms • • 12 Feb 2024

Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3.

201

12 Feb 2024

Paper
Code

Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data

Earth-Intelligence-Lab/vleo-bench • 31 Jan 2024

Large Vision-Language Models (VLMs) have demonstrated impressive performance on complex tasks involving visual input with natural language instructions.

31 Jan 2024

Paper
Code

CPR++: Object Localization via Single Coarse Point Supervision

ucas-vg/pointtinybenchmark • • 30 Jan 2024

CPR reduces the semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.

632

30 Jan 2024

Paper
Code

Spatial Structure Constraints for Weakly Supervised Semantic Segmentation

nust-machine-intelligence-laboratory/ssc • • 20 Jan 2024

In this paper, we propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.

20 Jan 2024

Paper
Code

Object Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result