Differentiable ARchiTecture Search (DARTS) uses a continuous relaxation of network representation and dramatically accelerates Neural Architecture Search (NAS) by almost thousands of times in GPU-day.
Video-based person re-identification (Re-ID) which aims to associate people across non-overlapping cameras using surveillance video is a challenging task.
In this paper, we explore effective mechanisms to boost both of them from the perspective of network hierarchy, where a typical network can be hierarchically divided into output stage, intermediate stage and input stage.
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited.
Furthermore, a novel framework based on convolutional variational autoencoder and deep Koopman embedding is proposed to approximate the Koopman operators, which is used as dynamical features from the linearized embedding space for cross-view gait recognition.
Missing textures in the incomplete UV map are further full-filled by the UV generator.
The existing auto-encoder based face pose editing methods primarily focus on modeling the identity preserving ability during pose synthesis, but are less able to preserve the image style properly, which refers to the color, brightness, saturation, etc.
In this paper, an effective pipeline to automatic 4D Facial Expression Recognition (4D FER) is proposed.
Besides, we propose a perceptual distortion constraint and add it into the objective function of adversarial attack to jointly optimize the perceptual distortions and attack success rate.
Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting target appearance changes.
Objects in aerial images usually have arbitrary orientations and are densely located over the ground, making them extremely challenge to be detected.
To further lift the classification performance, in this work we propose a graph convolution network (GCN) based framework for HSI classification that uses two clustering operations to better exploit multi-hop node correlations and also effectively reduce graph size.
LiDAR-based 3D object detection is an important task for autonomous driving and current approaches suffer from sparse and partial point clouds of distant and occluded objects.
Ranked #3 on 3D Object Detection on KITTI Cars Easy val
However, since there are no intended HR MS images as references for learning, almost all of the existing methods down-sample the MS and PAN images and regard the original MS images as targets to form a supervised setting for training.
GCM is devised with low-complexity and lightweight manner, to make the interactive information across the channels of the feature maps more efficient, meanwhile guide the model to select more suitable scales generated from PSM.
Experiments on standard datasets shows our ARM can bring consistent improvements for both coarse annotations and fine annotations.
Object counting, whose aim is to estimate the number of objects from a given image, is an important and challenging computation task.
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances, and is useful when manual annotation is time-consuming or data acquisition is limited.
Ranked #6 on Few-Shot Object Detection on MS-COCO (30-shot)
Video-based person re-identification (Re-ID) is an important computer vision task.
Through our analysis, we expect to make reasonable inference and prediction for the future development of crowd counting, and meanwhile, it can also provide feasible solutions for the problem of object counting in other fields.
Thanks to this coarse-to-fine feature adaptation, domain knowledge in foreground regions can be effectively transferred.
Traditional change detection methods usually follow the image differencing, change feature extraction and classification framework, and their performance is limited by such simple image domain differencing and also the hand-crafted features.
With the development of deep neural networks, digital fake paintings can be generated by various style transfer algorithms. To detect the fake generated paintings, we analyze the fake generated and real paintings in Fourier frequency domain and observe statistical differences and artifacts.
Significant efforts have been made to address this problem and achieve great progress, yet counting number of ground objects from remote sensing images is barely studied.
Deep learning based methods have achieved surprising progress in Scene Text Recognition (STR), one of classic problems in computer vision.
Recently, Human Attribute Recognition (HAR) has become a hot topic due to its scientific challenges and application potentials, where localizing attributes is a crucial stage but not well handled.
Pyramidal feature representation is the common practice to address the challenge of scale variation in object detection.
Ranked #103 on Object Detection on COCO test-dev
In this paper, we first tackle the problem of pedestrian attribute recognition by video-based approach.
The two underlying requirements of face age progression, i. e. aging accuracy and identity permanence, are not well studied in the literature.
Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships.
Group activity recognition plays a fundamental role in a variety of applications, e. g. sports video analysis and intelligent surveillance.
This paper addresses the problem of remote sensing image pan-sharpening from the perspective of generative adversarial learning.
Specifically, instead of learning explicit projections or adding fully-connected mapping layers, the proposed Adversarial Binary Coding (ABC) framework guides the extraction of binary codes implicitly and effectively.
Road extraction from aerial images has been a hot research topic in the field of remote sensing image analysis.
Ranked #2 on Lung Nodule Segmentation on LUNA
In this paper, we develop a novel convolutional neural network based approach to extract and aggregate useful information from gait silhouette sequence images instead of simply representing the gait process by averaging silhouette images.
Sentiment analysis is attracting more and more attentions and has become a very hot research topic due to its potential applications in personalized recommendation, opinion mining, etc.
Current top-performing object detectors depend on deep CNN backbones, such as ResNet-101 and Inception, benefiting from their powerful feature representations but suffering from high computational costs.
Unlike previous CNN based methods that consider pan-sharpening as a super resolution problem and perform pan-sharpening in pixel level, the proposed TFNet aims to fuse PAN and MS images in feature level and reconstruct the pan-sharpened image from the fused features.
Numerous methods have been proposed for person re-identification, most of which however neglect the matching efficiency.
Our ZSECOC equips the conventional ECOC with the additional capability of ZSAR, by addressing the domain shift problem.
Ranked #2 on Zero-Shot Action Recognition on Olympics
Extensive experiments on four realistic action datasets in terms of three tasks (i. e., partial action retrieval, recognition and prediction) clearly show the superiority of PRBC over the state-of-the-art methods, along with significantly reduced memory load and computational costs during the online test.
Face aging simulation has received rising investigations nowadays, whereas it still remains a challenge to generate convincing and natural age-progressed face images.