no code implementations • ECCV 2020 • Xiaobo Wang, Tianyu Fu, Shengcai Liao, Shuo Wang, Zhen Lei, Tao Mei
Knowledge distillation is an effective tool to compress large pre-trained Convolutional Neural Networks (CNNs) or their ensembles into models applicable to mobile and embedded devices.
no code implementations • 13 Mar 2023 • Sanqing Qu, Yingwei Pan, Guang Chen, Ting Yao, Changjun Jiang, Tao Mei
We validate the superiority of our MAD in a variety of single-DG scenarios with different modalities, including recognition on 1D texts, 2D images, 3D point clouds, and semantic segmentation on 2D images.
3 code implementations • AAAI 2021 • Yachao Zhang, Zonghao Li, Yuan Xie, Yanyun Qu, Cuihua Li, Tao Mei
Firstly, we construct a pretext task, \textit{i. e.,} point cloud colorization, with a self-supervised learning to transfer the learned prior knowledge from a large amount of unlabeled point cloud to a weakly supervised network.
1 code implementation • 6 Dec 2022 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Jianlin Feng, Hongyang Chao, Tao Mei
The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process.
no code implementations • 15 Nov 2022 • Yiheng Zhang, Ting Yao, Zhaofan Qiu, Tao Mei
In this paper, we ask the question: how much each sample in source domain contributes to the network's prediction on the samples from target domain.
1 code implementation • 15 Nov 2022 • Zhaofan Qiu, Yehao Li, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei
In this paper, we propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net.
1 code implementation • 15 Nov 2022 • Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Chong-Wah Ngo, Tao Mei
The pre-determined kernel size severely limits the temporal receptive fields and the fixed weights treat each spatial location across frames equally, resulting in sub-optimal solution for long-range temporal modeling in natural scenes.
1 code implementation • 15 Nov 2022 • Qi Cai, Yingwei Pan, Ting Yao, Tao Mei
Recent progress on 2D object detection has featured Cascade RCNN, which capitalizes on a sequence of cascade detectors to progressively improve proposal quality, towards high-quality object detection.
1 code implementation • 26 Sep 2022 • Jingyang Lin, Yu Wang, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei
Existing works attempt to solve the problem by explicitly imposing uncertainty on classifiers when OOD inputs are exposed to the classifier during training.
2 code implementations • 8 Sep 2022 • ZiCheng Zhang, Yinglu Liu, Congying Han, Tiande Guo, Ting Yao, Tao Mei
While previous works mainly focus on style transfer, we propose a novel and concise framework to address the \textit{generalized one-shot adaptation} task for both style and entity transfer, in which a reference image and its binary entity mask are provided.
no code implementations • 2 Sep 2022 • Chuanhang Yan, Yu Sun, Qian Bao, Jinhui Pang, Wu Liu, Tao Mei
We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time.
no code implementations • 1 Sep 2022 • Xiaodong Chen, Wu Liu, Xinchen Liu, Yongdong Zhang, Jungong Han, Tao Mei
In DestFormer, the spatial and temporal dimensions of the 4D point cloud videos are decoupled to achieve efficient self-attention for learning both long-term and short-term features.
1 code implementation • 27 Jul 2022 • Yiheng Zhang, Ting Yao, Zhaofan Qiu, Tao Mei
In this paper, we thoroughly analyze the design of convolutional blocks (the type of convolutions and the number of channels in convolutions), and the ways of interactions across multiple scales, all from lightweight standpoint for semantic segmentation.
1 code implementation • 11 Jul 2022 • Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, Xiao-Ping Zhang, Tao Mei
Dual-ViT is henceforth able to reduce the computational complexity without compromising much accuracy.
2 code implementations • 11 Jul 2022 • Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, Tao Mei
Motivated by the wavelet theory, we construct a new Wavelet Vision Transformer (\textbf{Wave-ViT}) that formulates the invertible down-sampling with wavelet transforms and self-attention learning in a unified way.
Ranked #184 on
Image Classification
on ImageNet
no code implementations • 27 Jun 2022 • Jiyang Yu, Jingen Liu, Jing Huang, Wei zhang, Tao Mei
To this end, we propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
1 code implementation • 21 Jun 2022 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
The video-to-text/video-to-query projections over text prototypes/query vocabulary then start the text-to-query or query-to-text calibration to estimate the amendment to query or text.
1 code implementation • CVPR 2022 • Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo, Tao Mei
In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location.
Ranked #7 on
Action Recognition
on Something-Something V1
1 code implementation • CVPR 2022 • Yehao Li, Yingwei Pan, Ting Yao, Tao Mei
In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.
1 code implementation • CVPR 2022 • Yong Zhang, Yingwei Pan, Ting Yao, Rui Huang, Tao Mei, Chang-Wen Chen
Such design decomposes the process of HOI set prediction into two subsequent phases, i. e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer.
Ranked #1 on
Human-Object Interaction Detection
on V-COCO
no code implementations • CVPR 2022 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei
By deriving the novel grouped time mixing (GTM) operations, we equip the basic token-mixing MLP with the ability of temporal modeling.
Ranked #15 on
Action Recognition
on Something-Something V1
1 code implementation • 13 Jun 2022 • Yingwei Pan, Yehao Li, Yiheng Zhang, Qi Cai, Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
no code implementations • 2 Jun 2022 • Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen
To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.
1 code implementation • CVPR 2022 • Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei
Based on Gait3D, we comprehensively compare our method with existing gait recognition approaches, which reflects the superior performance of our framework and the potential of 3D representations for gait recognition in the wild.
Ranked #1 on
Gait Recognition
on Gait3D
no code implementations • 2 Apr 2022 • Akash Gupta, Jingen Liu, Liefeng Bo, Amit K. Roy-Chowdhury, Tao Mei
To incorporate this ability in intelligent systems a question worth pondering upon is how exactly do we anticipate?
no code implementations • 11 Mar 2022 • Jie Ma, Yalong Bai, Bineng Zhong, Wei zhang, Ting Yao, Tao Mei
Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions.
no code implementations • 9 Mar 2022 • Xiaodong Chen, Xinchen Liu, Wu Liu, Kun Liu, Dong Wu, Yongdong Zhang, Tao Mei
Therefore, researchers start to focus on a new task, Part-level Action Parsing (PAP), which aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.
1 code implementation • 4 Mar 2022 • Jing Xu, Wei zhang, Yalong Bai, Qibin Sun, Tao Mei
Motivated by studies in linguistics, we decompose the co-speech motion into two complementary parts: pose modes and rhythmic dynamics.
no code implementations • 18 Jan 2022 • Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo
In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation.
1 code implementation • 11 Jan 2022 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei
In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e. g., learning rate and the length of input clips, in each state.
no code implementations • CVPR 2021 • Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei
For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.
1 code implementation • ICCV 2021 • Rui Li, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei
To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning.
no code implementations • 11 Jan 2022 • Yingwei Pan, Yue Chen, Qian Bao, Ning Zhang, Ting Yao, Jingen Liu, Tao Mei
To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events.
no code implementations • 11 Jan 2022 • Yehao Li, Jiahao Fan, Yingwei Pan, Ting Yao, Weiyao Lin, Tao Mei
Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks.
no code implementations • ICCV 2021 • Zhaofan Qiu, Ting Yao, Yan Shu, Chong-Wah Ngo, Tao Mei
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame.
no code implementations • CVPR 2021 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xiao-Ping Zhang, Dong Wu, Tao Mei
Video content is multifaceted, consisting of objects, scenes, interactions or actions.
no code implementations • 27 Dec 2021 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei
Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots.
1 code implementation • NeurIPS 2021 • Yu Wang, Jingyang Lin, Jingjing Zou, Yingwei Pan, Ting Yao, Tao Mei
Our work reveals a structured shortcoming of the existing mainstream self-supervised learning methods.
3 code implementations • CVPR 2022 • Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, Michael J. Black
To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults.
Ranked #1 on
3D Depth Estimation
on Relative Human
(using extra training data)
no code implementations • 14 Dec 2021 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei
BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks.
no code implementations • 14 Dec 2021 • Yang Chen, Yingwei Pan, Yu Wang, Ting Yao, Xinmei Tian, Tao Mei
From this point, we present a particular paradigm of self-supervised learning tailored for domain adaptation, i. e., Transferrable Contrastive Learning (TCL), which links the SSL and the desired cross-domain transferability congruently.
no code implementations • ICCV 2021 • Yang Chen, Yu Wang, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei
Correspondingly, we also propose a novel "jury" mechanism, which is particularly effective in learning useful semantic feature commonalities among domains.
Ranked #24 on
Domain Generalization
on PACS
1 code implementation • 1 Dec 2021 • Hangtong Wu, Dan Zen, Yibo Hu, Hailin Shi, Tao Mei
Such noisy samples are hard to predict precise depth values, thus may obstruct the widely-used depth supervised optimization.
no code implementations • 26 Oct 2021 • Tong Shen, Jiawei Zuo, Fan Shi, Jin Zhang, Liqin Jiang, Meng Chen, Zhengchen Zhang, Wei zhang, Xiaodong He, Tao Mei
We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries.
no code implementations • CVPR 2022 • Yalong Bai, Yifan Yang, Wei zhang, Tao Mei
Specifically, we adapt heavy augmentation policies after the views lightly augmented by standard augmentations, to generate harder view (HV).
no code implementations • 7 Oct 2021 • Xiaodong Chen, Xinchen Liu, Kun Liu, Wu Liu, Tao Mei
This technical report introduces our 2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Workshop 2021.
1 code implementation • 30 Sep 2021 • Xiao Wang, Jingen Liu, Tao Mei, Jiebo Luo
Unlike the mainstream clustering-based methods, our framework exploits a transformer-based feature reconstruction scheme to detect event boundary by reconstruction errors.
no code implementations • 5 Sep 2021 • Tong Sha, Wei zhang, Tong Shen, Zhoujun Li, Tao Mei
Deep person generation has attracted extensive research attention due to its wide applications in virtual agents, video conferencing, online shopping and art/movie production.
1 code implementation • CVPR 2022 • Jiyang Yu, Jingen Liu, Liefeng Bo, Tao Mei
Those methods achieve limited performance as they suffer from the challenge in spatial frame alignment and the lack of useful information from similar LR neighbor frames.
2 code implementations • 18 Aug 2021 • Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei
Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.
3 code implementations • 11 Aug 2021 • Lingxiao He, Wu Liu, Jian Liang, Kecheng Zheng, Xingyu Liao, Peng Cheng, Tao Mei
Instead, we aim to explore multiple labeled datasets to learn generalized domain-invariant representations for person re-id, which is expected universally effective for each new-coming re-id scenario.
Generalizable Person Re-identification
Knowledge Distillation
+1
no code implementations • 5 Aug 2021 • Yu Wang, Jingyang Lin, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei
In this paper, we construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning, referred to as LORAC.
5 code implementations • 26 Jul 2021 • Yehao Li, Ting Yao, Yingwei Pan, Tao Mei
Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.
Ranked #251 on
Image Classification
on ImageNet
1 code implementation • 26 Jul 2021 • Yalong Bai, Mohan Zhou, Wei zhang, BoWen Zhou, Tao Mei
Experimental results on ImageNet demonstrate the compatibility and effectiveness on a much wider range of augmentations, while consuming fewer parameters and lower computational costs at inference time.
no code implementations • 7 Jul 2021 • Hanbin Dai, Hailin Shi, Wu Liu, Linfang Wang, Yinglu Liu, Tao Mei
By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation.
1 code implementation • 10 May 2021 • Yuchi Liu, Hailin Shi, Hang Du, Rui Zhu, Jun Wang, Liang Zheng, Tao Mei
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
no code implementations • 10 May 2021 • Hailin Shi, Dan Zeng, Yichun Tai, Hang Du, Yibo Hu, ZiCheng Zhang, Tao Mei
However, unlike the existing public face datasets, in many real-world scenarios of face recognition, the depth of training dataset is shallow, which means only two face images are available for each ID.
no code implementations • CVPR 2021 • Wang Luo, Tianzhu Zhang, Wenfei Yang, Jingen Liu, Tao Mei, Feng Wu, Yongdong Zhang
In this paper, we present an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which can mitigate the above two challenges by learning an action unit memory bank.
Ranked #5 on
Weakly Supervised Action Localization
on THUMOS14
Weakly Supervised Action Localization
Weakly-supervised Temporal Action Localization
+1
no code implementations • 23 Apr 2021 • Wu Liu, Qian Bao, Yu Sun, Tao Mei
We believe this survey will provide the readers with a deep and insightful understanding of monocular human pose estimation.
no code implementations • 14 Apr 2021 • Hang Du, Hailin Shi, Yinglu Liu, Dan Zeng, Tao Mei
In this paper, we aim to address the challenge of NIR-VIS masked face recognition from the perspectives of training data and training method.
1 code implementation • CVPR 2021 • Jiahui She, Yibo Hu, Hailin Shi, Jun Wang, Qiu Shen, Tao Mei
Due to the subjective annotation and the inherent interclass similarity of facial expressions, one of key challenges in Facial Expression Recognition (FER) is the annotation ambiguity.
no code implementations • 1 Apr 2021 • Tianyu Hua, Hongdong Zheng, Yalong Bai, Wei zhang, Xiao-Ping Zhang, Tao Mei
Our method tends to synthesize plausible layouts and objects, respecting the interplay of multiple objects in an image.
1 code implementation • CVPR 2021 • Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, Zheng-Jun Zha
In this paper, we propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.
Domain Adaptive Person Re-Identification
Online Clustering
+3
1 code implementation • ICCV 2021 • Xiaodong Chen, Xinchen Liu, Wu Liu, Xiao-Ping Zhang, Yongdong Zhang, Tao Mei
In this paper, we propose a post-hoc method, named Attribute-guided Metric Distillation (AMD), to explain existing ReID models.
Ranked #42 on
Person Re-Identification
on DukeMTMC-reID
1 code implementation • 9 Feb 2021 • Jinkai Zheng, Xinchen Liu, Chenggang Yan, Jiyong Zhang, Wu Liu, XiaoPing Zhang, Tao Mei
Despite significant improvement in gait recognition with deep learning, existing studies still neglect a more practical but challenging scenario -- unsupervised cross-domain gait recognition which aims to learn a model on a labeled dataset then adapts it to an unlabeled dataset.
1 code implementation • 27 Jan 2021 • Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei
Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging.
1 code implementation • ICCV 2021 • Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, Ran He
Visible-Infrared person re-identification (VI-ReID) aims to match cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment.
no code implementations • 18 Jan 2021 • Gusi Te, Wei Hu, Yinglu Liu, Hailin Shi, Tao Mei
Face parsing infers a pixel-wise label to each facial component, which has drawn much attention recently.
Ranked #3 on
Face Parsing
on CelebAMask-HQ
2 code implementations • 12 Jan 2021 • Jun Wang, Yinglu Liu, Yibo Hu, Hailin Shi, Tao Mei
For example, the production of face representation network desires a modular training scheme to consider the proper choice from various candidates of state-of-the-art backbone and training supervision subject to the real-world face recognition demand; for performance analysis and comparison, the standard and automatic evaluation with a bunch of models on multiple benchmarks will be a desired tool as well; besides, a public groundwork is welcomed for deploying the face recognition in the shape of holistic pipeline.
no code implementations • 27 Oct 2020 • Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei
To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available.
no code implementations • 10 Oct 2020 • Kecheng Zheng, Wu Liu, Jiawei Liu, Zheng-Jun Zha, Tao Mei
This hard selection strategy is able to fuse the strong-relevant multi-modality features for alleviating the problem of matching redundancy.
Ranked #8 on
Text based Person Retrieval
on CUHK-PEDES
1 code implementation • NeurIPS 2020 • Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei
This paper explores useful modifications of the recent development in contrastive learning via novel probabilistic modeling.
no code implementations • 28 Sep 2020 • Hang Du, Hailin Shi, Dan Zeng, Xiao-Ping Zhang, Tao Mei
To start with, we present an overview of the end-to-end deep face recognition.
1 code implementation • ECCV 2020 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.
1 code implementation • ICCV 2021 • Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei
Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map.
Ranked #1 on
3D Multi-Person Mesh Recovery
on Relative Human
(using extra training data)
no code implementations • 19 Aug 2020 • Boqiang Xu, Lingxiao He, Xingyu Liao, Wu Liu, Zhenan Sun, Tao Mei
Given the input person image, the ensemble method would focus on the head-shoulder feature by assigning a larger weight if the individual insides the image is in black clothing.
3 code implementations • 3 Aug 2020 • Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei
In this paper, we compose a trilogy of exploring the basic and generic supervision in the sequence from spatial, spatiotemporal and sequential perspectives.
no code implementations • 27 Jul 2020 • Yingwei Pan, Jun Xu, Yehao Li, Ting Yao, Tao Mei
The Pre-training for Video Captioning Challenge 2020 Summary: results and challenge participants' technical reports.
1 code implementation • ECCV 2020 • Gusi Te, Yinglu Liu, Wei Hu, Hailin Shi, Tao Mei
Specifically, we encode a facial image onto a global graph representation where a collection of pixels ("regions") with similar features are projected to each vertex.
Ranked #4 on
Face Parsing
on CelebAMask-HQ
no code implementations • 20 Jul 2020 • Dan Zeng, Hailin Shi, Hang Du, Jun Wang, Zhen Lei, Tao Mei
However, the correlation between hard positive and hard negative is overlooked, and so is the relation between the margins in positive and negative logits.
1 code implementation • ECCV 2020 • Haoran Wang, Tong Shen, Wei zhang, Ling-Yu Duan, Tao Mei
To fully exploit the supervision in the source domain, we propose a fine-grained adversarial learning strategy for class-level feature alignment while preserving the internal structure of semantics across domains.
Ranked #15 on
Image-to-Image Translation
on SYNTHIA-to-Cityscapes
2 code implementations • ECCV 2020 • Hang Du, Hailin Shi, Yuchi Liu, Jun Wang, Zhen Lei, Dan Zeng, Tao Mei
Extensive experiments on various benchmarks of face recognition show the proposed method significantly improves the training, not only in shallow face learning, but also for conventional deep face data.
1 code implementation • ICML 2020 • Xiaobo Wang, Shuo Wang, Cheng Chi, Shifeng Zhang, Tao Mei
In face recognition, designing margin-based (e. g., angular, additive, additive angular margins) softmax loss functions plays an important role in learning discriminative features.
1 code implementation • 7 Jul 2020 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei
Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.
no code implementations • 5 Jul 2020 • Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, Tao Mei
In this work, we present Auto-captions on GIF, which is a new large-scale pre-training dataset for generic video understanding.
no code implementations • CVPR 2020 • Yiheng Zhang, Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Dong Liu, Tao Mei
In the view of extremely expensive expert labeling, recent research has shown that the models trained on photo-realistic synthetic data (e. g., computer games) with computer-generated annotations can be adapted to real images.
Ranked #13 on
Domain Adaptation
on SYNTHIA-to-Cityscapes
no code implementations • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Chong-Wah Ngo, Tao Mei
A clustering branch is capitalized on to ensure that the learnt representation preserves such underlying structure by matching the estimated assignment distribution over clusters to the inherent cluster distribution for each target sample.
1 code implementation • CVPR 2020 • Qi Cai, Yingwei Pan, Yu Wang, Jingen Liu, Ting Yao, Tao Mei
To this end, we devise a general loss function to cover most region-based object detectors with various sampling strategies, and then based on it we propose a unified sample weighting network to predict a sample's task weights.
no code implementations • 8 Jun 2020 • Zhi Li, Bo Wu, Qi Liu, Likang Wu, Hongke Zhao, Tao Mei
Towards this end, in this paper, we propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents.
2 code implementations • 4 Jun 2020 • Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, Tao Mei
General Instance Re-identification is a very important task in the computer vision, which can be widely used in many practical applications, such as person/vehicle re-identification, face recognition, wildlife protection, commodity tracing, and snapshop, etc.. To meet the increasing application demand for general instance re-identification, we present FastReID as a widely used software system in JD AI Research.
Ranked #1 on
Person Re-Identification
on MSMT17-C
no code implementations • 13 May 2020 • Ning Zhang, Jingen Liu, Ke Wang, Dan Zeng, Tao Mei
Inspired by the human "visual tracking" capability which leverages motion cues to distinguish the target from the background, we propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking, which successfully exploits both appearance and motion features for model update.
2 code implementations • 14 Apr 2020 • Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, Tao Mei
This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain.
Ranked #1 on
Vehicle Re-Identification
on VehicleID
no code implementations • Proceedings of the AAAI Conference on Artificial Intelligence 2020 • Yinglu Liu, Hailin Shi, Hao Shen, Yue Si, Xiaobo Wang, Tao Mei
The dataset is publicly accessible to the community for boosting the advance of face parsing. 1 Second, a simple yet effective Boundary-Attention Semantic Segmentation (BASS) method is proposed for face parsing, which contains a three-branch network with elaborately developed loss functions to fully exploit the boundary information.
Ranked #8 on
Face Parsing
on LaPa
2 code implementations • CVPR 2020 • Mohan Zhou, Yalong Bai, Wei zhang, Tiejun Zhao, Tao Mei
Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category.
Ranked #1 on
Image Recognition
on ImageNet
no code implementations • 31 Mar 2020 • Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei
It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.
1 code implementation • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Tao Mei
Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs.
Ranked #21 on
Image Captioning
on COCO Captions
no code implementations • 8 Mar 2020 • Shuo Yang, Wei Yu, Ying Zheng, Hongxun Yao, Tao Mei
To solve this new problem, we propose a hierarchical adaptive semantic-visual tree (ASVT) to depict the architecture of merchandise categories, which evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously.
no code implementations • 26 Dec 2019 • Tao Mei, Wei zhang, Ting Yao
The real-world deployment or services of vision and language are elaborated as well.
1 code implementation • 13 Dec 2019 • Jiahang Wang, Wei zhang, Weizhong Liu, Tao Mei
However, existing methods can hardly preserve the details in clothing texture and facial identity (face, hair) while fitting novel clothes and poses onto a person.
no code implementations • 12 Dec 2019 • Jui-Hsin Lai, Bo Wu, Xin Wang, Dan Zeng, Tao Mei, Jingen Liu
This model associates themes with the pairwise compatibility with attention, and thus compute the outfit-wise compatibility.
no code implementations • 12 Dec 2019 • Jia Li, Tong Shen, Wei zhang, Hui Ren, Dan Zeng, Tao Mei
The stunning progress in face manipulation methods has made it possible to synthesize realistic fake face images, which poses potential threats to our society.
no code implementations • 26 Nov 2019 • Xiaobo Wang, Shifeng Zhang, Shuo Wang, Tianyu Fu, Hailin Shi, Tao Mei
Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination.
no code implementations • 23 Sep 2019 • Zhaofan Qiu, Ting Yao, Yiheng Zhang, Yongdong Zhang, Tao Mei
Moreover, we enlarge the search space of SDAS particularly for video recognition by devising several unique operations to encode spatio-temporal dynamics and demonstrate the impact in affecting the architecture search of SDAS.
1 code implementation • CVPR 2019 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
Temporally localizing actions in a video is a fundamental challenge in video understanding.
no code implementations • 9 Sep 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei
The problem of distance metric learning is mostly considered from the perspective of learning an embedding space, where the distances between pairs of examples are in correspondence with a similarity metric.
no code implementations • ICCV 2019 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
It is always well believed that parsing an image into constituent visual patterns would be helpful for understanding and representing an image.
no code implementations • 2 Sep 2019 • Hongdong Zheng, Yalong Bai, Wei zhang, Tao Mei
In our framework, a spatial constraint module is designed to fit reasonable scaling and spatial layout of object pairs with considering relationship between them.
no code implementations • CVPR 2019 • Yiheng Zhang, Zhaofan Qiu, Jingen Liu, Ting Yao, Dong Liu, Tao Mei
As a result, our CAS is able to search an optimized architecture with customized constraints.
no code implementations • 26 Aug 2019 • Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei
Unsupervised image-to-image translation is the task of translating an image from one domain to another in the absence of any paired training examples and tends to be more applicable to practical applications.
2 code implementations • ICCV 2019 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei
In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.
1 code implementation • 16 Aug 2019 • Jianhao Zhang, Yingwei Pan, Ting Yao, He Zhao, Tao Mei
It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations.
no code implementations • 1 Aug 2019 • Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei
A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure.
1 code implementation • 29 Jul 2019 • Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei
To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.
no code implementations • 25 Jul 2019 • Yun Ye, Yixin Li, Bo Wu, Wei zhang, Ling-Yu Duan, Tao Mei
For "hard" attributes with insufficient training data, Deact brings more stable synthetic samples for training and further improve the performance.
no code implementations • CVPR 2019 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei
Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.
Ranked #4 on
Action Recognition
on UCF101
no code implementations • 17 May 2019 • Weiyao Lin, Yuxi Li, Hao Xiao, John See, Junni Zou, Hongkai Xiong, Jingdong Wang, Tao Mei
The task of re-identifying groups of people underdifferent camera views is an important yet less-studied problem. Group re-identification (Re-ID) is a very challenging task sinceit is not only adversely affected by common issues in traditionalsingle object Re-ID problems such as viewpoint and human posevariations, but it also suffers from changes in group layout andgroup membership.
no code implementations • 13 May 2019 • Yinglu Liu, Hailin Shi, Yue Si, Hao Shen, Xiaobo Wang, Tao Mei
Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks.
no code implementations • 12 May 2019 • Danlu Chen, Xu-Yao Zhang, Wei zhang, Yao Lu, Xiuli Li, Tao Mei
Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression.
1 code implementation • 3 May 2019 • Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei
Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations.
no code implementations • CVPR 2019 • Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei
Specifically, we present Transferrable Prototypical Networks (TPN) for adaptation such that the prototypes for each class in source and target domains are close in the embedding space and the score distributions predicted by prototypes separately on source and target data are similar.
no code implementations • CVPR 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei
Image captioning has received significant attention with remarkable improvements in recent advances.
no code implementations • 20 Apr 2019 • Xinyu Li, Wei zhang, Tong Shen, Tao Mei
Selfie and cartoon are two popular artistic forms that are widely presented in our daily life.
1 code implementation • CVPR 2019 • Sijie Song, Wei zhang, Jiaying Liu, Tao Mei
Firstly, a semantic generative network is proposed to transform between semantic parsing maps, in order to simplify the non-rigid deformation learning.
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.
no code implementations • ICCV 2019 • Yuanzhi Liang, Yalong Bai, Wei zhang, Xueming Qian, Li Zhu, Tao Mei
Relationships encode the interactions among individual instances, and play a critical role in deep visual scene understanding.
no code implementations • 20 Jan 2019 • Shifeng Zhang, Rui Zhu, Xiaobo Wang, Hailin Shi, Tianyu Fu, Shuo Wang, Tao Mei, Stan Z. Li
With the availability of face detection benchmark WIDER FACE dataset, much of the progresses have been made by various algorithms in recent years.
no code implementations • 10 Jan 2019 • Meng Zhang, Xinchen Liu, Wu Liu, Anfu Zhou, Huadong Ma, Tao Mei
To bridge the domain gap, we propose a Multi-Granularity Reasoning framework for social relation recognition from images.
3 code implementations • 29 Dec 2018 • Xiaobo Wang, Shuo Wang, Shifeng Zhang, Tianyu Fu, Hailin Shi, Tao Mei
Face recognition has witnessed significant progresses due to the advances of deep convolutional neural networks (CNNs), the central challenge of which, is feature discrimination.
Ranked #1 on
Face Identification
on Trillion Pairs Dataset
1 code implementation • CVPR 2019 • Rui Zhu, Shifeng Zhang, Xiaobo Wang, Longyin Wen, Hailin Shi, Liefeng Bo, Tao Mei
Taking this advantage, we are able to explore various types of networks for object detection, without suffering from the poor convergence.
no code implementations • 18 Oct 2018 • Peiye Liu, Wu Liu, Huadong Ma, Tao Mei, Mingoo Seok
To transfer the knowledge of intermediate representations, we set high-level teacher feature maps as a target, toward which the student feature maps are trained.
no code implementations • ECCV 2018 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections.
no code implementations • ECCV 2018 • Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei
First, we model one of the pairwise interaction (e. g., image and question) by bilinear features, which is further encoded with the third dimension (e. g., answer) to be a triplet by bilinear tensor product.
no code implementations • ECCV 2018 • Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei
The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.
no code implementations • 3 Aug 2018 • Zhi Li, Hongke Zhao, Qi Liu, Zhenya Huang, Tao Mei, Enhong Chen
In this paper, we propose a novel Behavior-Intensive Neural Network (BINN) for next-item recommendation by incorporating both users' historical stable preferences and present consumption motivations.
no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei
Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.
no code implementations • 23 May 2018 • Canyi Lu, Jiashi Feng, Zhouchen Lin, Tao Mei, Shuicheng Yan
Second, we observe that many existing methods approximate the block diagonal representation matrix by using different structure priors, e. g., sparsity and low-rankness, which are indirect.
no code implementations • CVPR 2018 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei
A valid question is how to temporally localize and then describe events, which is known as "dense video captioning."
no code implementations • CVPR 2018 • Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei
The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets.
no code implementations • 23 Apr 2018 • Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei
Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream.
no code implementations • 23 Apr 2018 • Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, Tao Mei
In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions.
no code implementations • CVPR 2018 • Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, Tao Mei
In this paper, we introduce the new ideas of augmenting Convolutional Neural Networks (CNNs) with Memory and learning to learn the network parameters for the unlabelled images on the fly in one-shot learning.
no code implementations • ECCV 2018 • Yumin Suh, Jingdong Wang, Siyu Tang, Tao Mei, Kyoung Mu Lee
We propose a novel network that learns a part-aligned representation for person re-identification.
Ranked #4 on
Person Re-Identification
on UAV-Human
no code implementations • 19 Apr 2018 • Yitian Yuan, Tao Mei, Wenwu Zhu
Then, a multi-modal co-attention mechanism is introduced to generate not only video attention which reflects the global video structure, but also sentence attention which highlights the crucial details for temporal localization.
no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei
Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.
no code implementations • EMNLP 2018 • Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo
Most existing approaches adopt the pipeline of representing an image via pre-trained CNNs, and then using the uninterpretable CNN features in conjunction with the question to predict the answer.
no code implementations • 12 Dec 2017 • Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Tao Mei
We evaluate our approach on two large-scale Flickr image datasets with over 1. 8 million photos in total, for the task of popularity prediction.
1 code implementation • 12 Dec 2017 • Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei
With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space.
2 code implementations • ICCV 2017 • Zhaofan Qiu, Ting Yao, Tao Mei
In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.
Ranked #8 on
Action Recognition
on Sports-1M
no code implementations • 18 Oct 2017 • Feiran Huang, Xiao-Ming Zhang, Zhoujun Li, Tao Mei, Yueying He, Zhonghua Zhao
Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search.
3 code implementations • ICCV 2017 • Heliang Zheng, Jianlong Fu, Tao Mei, Jiebo Luo
Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way.
Ranked #22 on
Fine-Grained Image Classification
on CUB-200-2011
Fine-Grained Image Classification
Fine-Grained Image Recognition
+2
no code implementations • ICCV 2017 • Ryota Hinami, Tao Mei, Shin'ichi Satoh
Although convolutional neural networks (CNNs) have achieved promising results in learning such concepts, it remains an open question as to how to effectively use CNNs for abnormal event detection, mainly due to the environment-dependent nature of the anomaly detection.
no code implementations • 28 Aug 2017 • Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao
Large scale image dataset and deep convolutional neural network (DCNN) are two primary driving forces for the rapid progress made in generic object recognition tasks in recent years.
no code implementations • CVPR 2017 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
Image captioning often requires a large set of training image-sentence pairs.
no code implementations • CVPR 2017 • Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui
To solve the challenges, we propose a multi-level attention network for visual question answering that can simultaneously reduce the semantic gap by semantic attention and benefit fine-grained spatial inference by visual attention.
no code implementations • CVPR 2017 • Jianlong Fu, Heliang Zheng, Tao Mei
The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN).
Fine-Grained Image Classification
Fine-Grained Image Recognition
+1
no code implementations • CVPR 2017 • Zhaofan Qiu, Ting Yao, Tao Mei
In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner.
no code implementations • CVPR 2017 • Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei
Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community.
no code implementations • ICCV 2017 • Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.
no code implementations • 2 Jun 2016 • Yu Liu, Jianlong Fu, Tao Mei, Chang Wen Chen
Second, by using sGRU as basic units, the BMRNN is trained to align the local storylines into the global sequential timeline.
no code implementations • CVPR 2016 • Ting Yao, Tao Mei, Yong Rui
The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos.
no code implementations • CVPR 2016 • Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei
The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.
no code implementations • CVPR 2016 • Jun Xu, Tao Mei, Ting Yao, Yong Rui
In this paper we present MSR-VTT (standing for "ABC-Video to Text") which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text.
no code implementations • 16 Feb 2016 • Weiyao Lin, Yang Mi, Weiyue Wang, Jianxin Wu, Jingdong Wang, Tao Mei
These semantic regions can be used to recognize pre-defined activities in crowd scenes.