no code implementations • 13 Apr 2021 • Ajian Liu, Chenxu Zhao, Zitong Yu, Jun Wan, Anyang Su, Xing Liu, Zichang Tan, Sergio Escalera, Junliang Xing, Yanyan Liang, Guodong Guo, Zhen Lei, Stan Z. Li, Du Zhang
To bridge the gap to real-world applications, we introduce a largescale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask).
The releasing of such a large-scale dataset could be a useful initial step in research of tracking UAVs.
Owning to the unremitting efforts by a few institutes, significant progress has recently been made in designing superhuman AIs in No-limit Texas Hold'em (NLTH), the primary testbed for large-scale imperfect-information game research.
To obtain a high-performance vehicle ReID model, we present a novel Distance Shrinking with Angular Marginalizing (DSAM) loss function to perform hybrid learning in both the Original Feature Space (OFS) and the Feature Angular Space (FAS) using the local verification and the global identification information.
To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i. e., bounding boxes) and the pixel level (i. e., instance masks) jointly.
Ranked #53 on Instance Segmentation on COCO test-dev
Head and human detection have been rapidly improved with the development of deep convolutional neural networks.
Pedestrian detection in crowded scenes is a challenging problem, because occlusion happens frequently among different pedestrians.
Recent deep learning based face recognition methods have achieved great performance, but it still remains challenging to recognize very low-resolution query face like 28x28 pixels when CCTV camera is far from the captured subject.
To reduce domain divergence caused by that the source and target datasets are collected from different environments, we force to project the DSH feature maps from different domains to a new nominal domain, and a novel domain similarity loss is proposed based on one-class classification.
Skeleton-based human action recognition has attracted great interest thanks to the easy accessibility of the human skeleton data.
Ranked #1 on Skeleton Based Action Recognition on SYSU 3D
In this paper, we study the challenging unconstrained set-based face recognition problem where each subject face is instantiated by a set of media (images and videos) instead of a single image.
Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size.
Ranked #3 on Visual Object Tracking on VOT2017/18
To learn the multi-agent cooperation effectively and tackle the sub-optimality of demonstration, a self-improving learning method is proposed: On the one hand, the centralized state-action values are initialized by the demonstration and updated by the learned decentralized policy to improve the sub-optimality.
The method RAML aims to give the Meta learner the ability of leveraging the past learned knowledge to reduce the dimension of the original input data by expressing it into high representations, and help the Meta learner to perform well.
We propose a video level 2D feature representation by transforming the convolutional features of all frames to a 2D feature map, referred to as VideoMap.
Ranked #45 on Action Recognition on UCF101
In particular, the SRN consists of two modules: the Selective Two-step Classification (STC) module and the Selective Two-step Regression (STR) module.
Ranked #1 on Face Detection on PASCAL Face
1 code implementation • 2 Sep 2018 • Jian Zhao, Yu Cheng, Yi Cheng, Yang Yang, Haochong Lan, Fang Zhao, Lin Xiong, Yan Xu, Jianshu Li, Sugiri Pranata, ShengMei Shen, Junliang Xing, Hengzhu Liu, Shuicheng Yan, Jiashi Feng
Benchmarking our model on one of the most popular unconstrained face recognition datasets IJB-C additionally verifies the promising generalizability of AIM in recognizing faces in the wild.
Ranked #1 on Age-Invariant Face Recognition on CACDVS
Correlation filters based trackers rely on a periodic assumption of the search sample to efficiently distinguish the target from the background.
This paper proposes a novel Pose Partition Network (PPN) to address the challenging multi-person pose estimation problem.
To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties.
The RASNet model reformulates the correlation filter within a Siamese tracking framework, and introduces different kinds of the attention mechanisms to adapt the model without updating the model online.
Ranked #3 on Visual Object Tracking on OTB-2013
First, a novel cost-sensitive multi-task loss function is designed to learn transferable aging features by training on the source population.
In order to alleviate the effects of view variations, this paper introduces a novel view adaptation scheme, which automatically determines the virtual observation viewpoints in a learning based data driven manner.
Ranked #1 on Skeleton Based Action Recognition on UWA3D
Our deep architecture contains three networks, a Feature Net, a Temporal Net, and a Spatial Net.
Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts.
Ranked #77 on Person Re-Identification on Market-1501
We present a two-stage normalization scheme, human body normalization and limb normalization, to make the distribution of the relative joint locations compact, resulting in easier learning of convolutional spatial models and more accurate pose estimation.
At the global stage, given an image with a rough face detection result, the full face region is firstly re-initialized by a supervised spatial transformer network to a canonical shape state and then trained to regress a coarse landmark estimation.
This paper proposes a new Generative Partition Network (GPN) to address the challenging multi-person pose estimation problem.
Ranked #1 on Multi-Person Pose Estimation on WAF (AP metric)
In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously.
Rather than re-positioning the skeletons based on a human defined prior criterion, we design a view adaptive recurrent neural network (RNN) with LSTM architecture, which enables the network itself to adapt to the most suitable observation viewpoints from end to end.
Ranked #6 on Skeleton Based Action Recognition on SYSU 3D
Inspired with the social affinity property of moving objects, we propose a Graphical Social Topology (GST) model, which estimates the group dynamics by jointly modeling the group structure and the states of objects using a topological representation.
In this work, we propose an end-to-end spatial and temporal attention model for human action recognition from skeleton data.
Ranked #73 on Skeleton Based Action Recognition on NTU RGB+D
And we propose a semi-supervised attribute learning framework which progressively boosts the accuracy of attributes only using a limited number of labeled data.
In this paper, we study the problem of online action detection from streaming skeleton data.
Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions.
To address this, this paper presents a local subspace collaborative tracking method for robust visual tracking, where multiple linear and nonlinear subspaces are learned to better model the nonlinear relationship of object appearances.
Comprehensive experiments on WebFace, Morph II and MultiPIE databases well validate the effectiveness of the proposed kernel adaptation method and tree-structured convolutional architecture for facial traits recognition tasks, including identity, age and gender classification.
We inspect the recent advances in various aspects and propose some interesting directions for future research.
During each training stage, the SRD model learns a relational dictionary to capture consistent relationships between face appearance and shape, which are respectively modeled by the pose-indexed image features and the shape displacements for current estimated landmarks.
In this paper, we model interactions between neighbor targets by pair-wise motion context, and further encode such context into the global association optimization.