Keypoint Detection
150 papers with code • 7 benchmarks • 11 datasets
Keypoint Detection involves simultaneously detecting people and localizing their keypoints. Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. They are invariant to image rotation, shrinkage, translation, distortion, and so on.
( Image credit: PifPaf: Composite Fields for Human Pose Estimation; "Learning to surf" by fotologic, license: CC-BY-2.0 )
Libraries
Use these libraries to find Keypoint Detection models and implementationsDatasets
Latest papers
Self-supervised Learning of Contextualized Local Visual Embeddings
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks.
EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization
Visual localization is the task of estimating a 6-DoF camera pose of a query image within a provided 3D reference map.
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
Improving the matching of deformable objects by learning to detect keypoints
We propose a novel learned keypoint detection method to increase the number of correct matches for the task of non-rigid image correspondence.
A lightweight 3D dense facial landmark estimation model from position map data
As there is no public dataset available containing dense landmarks, we propose a pipeline to create a dense keypoint training dataset containing 520 key points across the whole face from an existing facial position map data.
Neural Interactive Keypoint Detection
Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.
DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local Feature Matching
To train a descriptor, we maximize the mutual nearest neighbour objective over the keypoints with a separate network.
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.
2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds
The commonly adopted detect-then-match approach to registration finds difficulties in the cross-modality cases due to the incompatible keypoint detection and inconsistent feature description.
Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data
We propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting.