We propose FFAVOD, standing for feature fusion architecture for video object detection.
Ranked #1 on Object Detection on UA-DETRAC
The models were trained and evaluated on Cityscapes, KITTI and IDD and the results are reported on their public benchmark, which are state-of-the-art at real-time speeds.
We propose a method for multi-object tracking and segmentation (MOTS) that does not require fine-tuning or per benchmark hyperparameter selection.
Siamese trackers demonstrated high performance in object tracking due to their balance between accuracy and speed.
Based on this advanced feature representation, our algorithm achieves high tracking accuracy, while outperforming several state-of-the-art trackers, including standard Siamese trackers.
We propose a novel unsupervised approach based on a two-stage object-centric adversarial framework that only needs object regions for detecting frame-level local anomalies in videos.
Commonly used features are color histograms, histograms of oriented gradients, deep features from convolutional neural networks and re-identification (ReID) features.
Our framework jointly trains a ReID network for discriminative features extraction in a source labelled domain using identity annotation, and adapts the ReID model to an unlabelled target domain by learning disentangled latent representations on the domain.
Multispectral disparity estimation is a difficult task for many reasons: it has all the same challenges as traditional visible-visible disparity estimation (occlusions, repetitive patterns, textureless surfaces), in addition of having very few common visual information between images (e. g. color information vs. thermal information).
In this paper, we propose a multiple object tracker, called MF-Tracker, that integrates multiple classical features (spatial distances and colours) and modern features (detection labels and re-identification features) in its tracking framework.
Consecutive frames in a video are highly redundant.
Ranked #2 on Object Detection on UAVDT
Because our focus is on the data association problem, our MOT method only uses simple image features, which are the center position and color of detections for each frame.
We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes, decreasing the signal of non-relevant areas.
Ranked #1 on Object Detection on UAVDT
We propose a method based on binary search trees and a large peer-labelled color name dataset.
To our knowledge, this is the state-of-the-start performance in Parkinson's gait recognition.
In this paper, we propose to combine detections from background subtraction and from a multiclass object detector for multiple object tracking (MOT) in urban traffic scenes.
Two new models, RetinaNet-Double and RetinaNet-Flow, are proposed, based respectively on the concatenation of a target frame with a preceding frame, and the concatenation of the optical flow with the target frame.
The segmentation of video sequences into foreground and background regions is a low-level process commonly used in video content analysis and smart surveillance applications.
Multiple object tracking (MOT) in urban traffic aims to produce the trajectories of the different road users that move across the field of view with different directions and speeds and that can have varying appearances and sizes.
In this paper, we focus on the development of a method that detects abnormal trajectories of road users at traffic intersections.
In this paper, we propose a robust object tracking algorithm based on a branch selection mechanism to choose the most efficient object representations from multi-branch siamese networks.
In this paper, we present a new method for detecting road users in an urban environment which leads to an improvement in multiple object tracking.
We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales.
A compact set of synthetic faces is generated that resemble individuals of interest under the capture conditions relevant to the OD.
Compared to tracking with a still camera, the images captured with a PTZ camera are highly dynamic in nature because the camera can perform large motion resulting in quickly changing capture conditions.
This paper addresses the problem of appearance matching across different challenges while doing visual face tracking in real-world scenarios.
Our findings show that through non-parametric statistical tests, we can extract useful latent information on the behaviour of latent constructs through machine learning methods and present strong and significant influence on the choice process.
In this paper, an online adaptive model-free tracker is proposed to track single objects in video sequences to deal with real-world tracking challenges like low-resolution, object deformation, occlusion and motion blur.
In this paper, we investigate how a robust visual tracker like KCF can improve multiple object tracking.
In visual tracking, part-based trackers are attractive since they are robust against occlusion and deformation.
Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years.
To achieve a global affine transformation that maximises the overlapping of infrared and visible foreground pixels, the matched keypoints of each local shape polygon are stored temporally in a buffer for a few number of frames.
This paper proposes a semantic segmentation method for outdoor scenes captured by a surveillance camera.