Search Results for author: Tao Mei

Found 143 papers, 40 papers with code

Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition

no code implementations ECCV 2020 Xiaobo Wang, Tianyu Fu, Shengcai Liao, Shuo Wang, Zhen Lei, Tao Mei

Knowledge distillation is an effective tool to compress large pre-trained Convolutional Neural Networks (CNNs) or their ensembles into models applicable to mobile and embedded devices.

Face Recognition Knowledge Distillation

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

no code implementations18 Jan 2022 Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo

In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation.

Knowledge Distillation

Optimization Planning for 3D ConvNets

1 code implementation11 Jan 2022 Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e. g., learning rate and the length of input clips, in each state.

Video Recognition

Condensing a Sequence to One Informative Frame for Video Recognition

no code implementations ICCV 2021 Zhaofan Qiu, Ting Yao, Yan Shu, Chong-Wah Ngo, Tao Mei

This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame.

Motion Estimation Video Recognition

Representing Videos as Discriminative Sub-graphs for Action Recognition

no code implementations CVPR 2021 Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.

Action Recognition Graph Learning +1

Motion-Focused Contrastive Learning of Video Representations

1 code implementation ICCV 2021 Rui Li, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning.

Contrastive Learning Data Augmentation +2

Smart Director: An Event-Driven Directing System for Live Broadcasting

no code implementations11 Jan 2022 Yingwei Pan, Yue Chen, Qian Bao, Ning Zhang, Ting Yao, Jingen Liu, Tao Mei

To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events.

Event Detection

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

no code implementations11 Jan 2022 Yehao Li, Jiahao Fan, Yingwei Pan, Ting Yao, Weiyao Lin, Tao Mei

Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks.

Image Captioning Language Modelling +2

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

no code implementations27 Dec 2021 Mohan Zhou, Yalong Bai, Wei zhang, Tiejun Zhao, Tao Mei

We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs, including the audio and visual signal of the speaker.

Talking Head Generation Translation

Putting People in their Place: Monocular Regression of 3D People in Depth

1 code implementation15 Dec 2021 Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, Michael J. Black

To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults.

A Style and Semantic Memory Mechanism for Domain Generalization

no code implementations ICCV 2021 Yang Chen, Yu Wang, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei

Correspondingly, we also propose a novel "jury" mechanism, which is particularly effective in learning useful semantic feature commonalities among domains.

Domain Generalization

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising

no code implementations14 Dec 2021 Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks.

Cross-Modal Retrieval Denoising +4

Transferrable Contrastive Learning for Visual Domain Adaptation

no code implementations14 Dec 2021 Yang Chen, Yingwei Pan, Yu Wang, Ting Yao, Xinmei Tian, Tao Mei

From this point, we present a particular paradigm of self-supervised learning tailored for domain adaptation, i. e., Transferrable Contrastive Learning (TCL), which links the SSL and the desired cross-domain transferability congruently.

Contrastive Learning Domain Adaptation +1

Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth Uncertainty Learning

1 code implementation1 Dec 2021 Hangtong Wu, Dan Zen, Yibo Hu, Hailin Shi, Tao Mei

Such noisy samples are hard to predict precise depth values, thus may obstruct the widely-used depth supervised optimization.

Face Anti-Spoofing Face Recognition

Directional Self-supervised Learning for Heavy Image Augmentations

no code implementations26 Oct 2021 Yalong Bai, Yifan Yang, Wei zhang, Tao Mei

Specifically, we adapt heavy augmentation policies after the views lightly augmented by standard augmentations, to generate harder view (HV).

Representation Learning Self-Supervised Learning

ViDA-MAN: Visual Dialog with Digital Humans

no code implementations26 Oct 2021 Tong Shen, Jiawei Zuo, Fan Shi, Jin Zhang, Liqin Jiang, Meng Chen, Zhengchen Zhang, Wei zhang, Xiaodong He, Tao Mei

We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries.

Speech Recognition Video Generation +1

A Baseline Framework for Part-level Action Parsing and Action Recognition

no code implementations7 Oct 2021 Xiaodong Chen, Xinchen Liu, Kun Liu, Wu Liu, Tao Mei

This technical report introduces our 2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Workshop 2021.

Action Parsing Action Recognition +1

CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation

no code implementations30 Sep 2021 Xiao Wang, Jingen Liu, Tao Mei, Jiebo Luo

Unlike the mainstream clustering-based methods, our framework exploits a transformer-based feature reconstruction scheme to detect event boundary by reconstruction errors.

Boundary Detection Event Segmentation +1

Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis

no code implementations5 Sep 2021 Tong Sha, Wei zhang, Tong Shen, Zhoujun Li, Tao Mei

Deep person generation has attracted extensive research attention due to its wide applications in virtual agents, video conferencing, online shopping and art/movie production.

Data Augmentation Talking Head Generation

Memory-Augmented Non-Local Attention for Video Super-Resolution

no code implementations25 Aug 2021 Jiyang Yu, Jingen Liu, Liefeng Bo, Tao Mei

Those methods achieve limited performance as they suffer from the challenge in spatial frame alignment and the lack of useful information from similar LR neighbor frames.

Video Super-Resolution

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

1 code implementation18 Aug 2021 Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei

Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.

Cross-Modal Retrieval Image Captioning +4

Semi-Supervised Domain Generalizable Person Re-Identification

3 code implementations11 Aug 2021 Lingxiao He, Wu Liu, Jian Liang, Kecheng Zheng, Xingyu Liao, Peng Cheng, Tao Mei

Instead, we aim to explore multiple labeled datasets to learn generalized domain-invariant representations for person re-id, which is expected universally effective for each new-coming re-id scenario.

Generalizable Person Re-identification Knowledge Distillation

A Low Rank Promoting Prior for Unsupervised Contrastive Learning

no code implementations5 Aug 2021 Yu Wang, Jingyang Lin, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

In this paper, we construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning, referred to as LORAC.

Contrastive Learning Image Classification +4

Contextual Transformer Networks for Visual Recognition

4 code implementations26 Jul 2021 Yehao Li, Ting Yao, Yingwei Pan, Tao Mei

Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.

Instance Segmentation Object Detection +1

Augmentation Pathways Network for Visual Recognition

1 code implementation26 Jul 2021 Yalong Bai, Mohan Zhou, Yuxiang Chen, Wei zhang, BoWen Zhou, Tao Mei

Experimental results on ImageNet benchmarks demonstrate the compatibility and effectiveness on a much wider range of augmentations (e. g., Crop, Gray, Grid Shuffle, RandAugment), while consuming fewer parameters and lower computational costs at inference time.

Data Augmentation

FasterPose: A Faster Simple Baseline for Human Pose Estimation

no code implementations7 Jul 2021 Hanbin Dai, Hailin Shi, Wu Liu, Linfang Wang, Yinglu Liu, Tao Mei

By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation.

Pose Estimation

Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face Learning

no code implementations10 May 2021 Hailin Shi, Dan Zeng, Yichun Tai, Hang Du, Yibo Hu, ZiCheng Zhang, Tao Mei

However, unlike the existing public face datasets, in many real-world scenarios of face recognition, the depth of training dataset is shallow, which means only two face images are available for each ID.

Face Recognition

Boosting Semi-Supervised Face Recognition with Noise Robustness

1 code implementation10 May 2021 Yuchi Liu, Hailin Shi, Hang Du, Rui Zhu, Jun Wang, Liang Zheng, Tao Mei

This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.

Face Recognition

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

no code implementations CVPR 2021 Wang Luo, Tianzhu Zhang, Wenfei Yang, Jingen Liu, Tao Mei, Feng Wu, Yongdong Zhang

In this paper, we present an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which can mitigate the above two challenges by learning an action unit memory bank.

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

no code implementations23 Apr 2021 Wu Liu, Qian Bao, Yu Sun, Tao Mei

We believe this survey will provide the readers with a deep and insightful understanding of monocular human pose estimation.

3D Human Pose Estimation

Towards NIR-VIS Masked Face Recognition

no code implementations14 Apr 2021 Hang Du, Hailin Shi, Yinglu Liu, Dan Zeng, Tao Mei

In this paper, we aim to address the challenge of NIR-VIS masked face recognition from the perspectives of training data and training method.

3D Face Reconstruction Face Recognition +1

Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition

1 code implementation CVPR 2021 Jiahui She, Yibo Hu, Hailin Shi, Jun Wang, Qiu Shen, Tao Mei

Due to the subjective annotation and the inherent interclass similarity of facial expressions, one of key challenges in Facial Expression Recognition (FER) is the annotation ambiguity.

Facial Expression Recognition

Exploiting Relationship for Complex-scene Image Generation

no code implementations1 Apr 2021 Tianyu Hua, Hongdong Zheng, Yalong Bai, Wei zhang, Xiao-Ping Zhang, Tao Mei

Our method tends to synthesize plausible layouts and objects, respecting the interplay of multiple objects in an image.

Image Generation Scene Generation

Group-aware Label Transfer for Domain Adaptive Person Re-identification

1 code implementation CVPR 2021 Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, Zheng-Jun Zha

In this paper, we propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.

Domain Adaptive Person Re-Identification Online Clustering +2

Explainable Person Re-Identification with Attribute-guided Metric Distillation

no code implementations ICCV 2021 Xiaodong Chen, Xinchen Liu, Wu Liu, Xiao-Ping Zhang, Yongdong Zhang, Tao Mei

In this paper, we propose a post-hoc method, named Attribute-guided Metric Distillation (AMD), to explain existing ReID models.

Person Re-Identification

TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition

1 code implementation9 Feb 2021 Jinkai Zheng, Xinchen Liu, Chenggang Yan, Jiyong Zhang, Wu Liu, XiaoPing Zhang, Tao Mei

Despite significant improvement in gait recognition with deep learning, existing studies still neglect a more practical but challenging scenario -- unsupervised cross-domain gait recognition which aims to learn a model on a labeled dataset then adapts it to an unlabeled dataset.

Gait Recognition

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

1 code implementation27 Jan 2021 Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei

Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging.

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification

1 code implementation ICCV 2021 Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, Ran He

Visible-Infrared person re-identification (VI-ReID) aims to match cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment.

Neural Architecture Search Person Re-Identification

FaceX-Zoo: A PyTorch Toolbox for Face Recognition

2 code implementations12 Jan 2021 Jun Wang, Yinglu Liu, Yibo Hu, Hailin Shi, Tao Mei

For example, the production of face representation network desires a modular training scheme to consider the proper choice from various candidates of state-of-the-art backbone and training supervision subject to the real-world face recognition demand; for performance analysis and comparison, the standard and automatic evaluation with a bunch of models on multiple benchmarks will be a desired tool as well; besides, a public groundwork is welcomed for deploying the face recognition in the shape of holistic pipeline.

Face Recognition

Synthetic Training for Monocular Human Mesh Recovery

no code implementations27 Oct 2020 Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei

To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available.

Hierarchical Gumbel Attention Network for Text-based Person Search

no code implementations10 Oct 2020 Kecheng Zheng, Wu Liu, Jiawei Liu, Zheng-Jun Zha, Tao Mei

This hard selection strategy is able to fuse the strong-relevant multi-modality features for alleviating the problem of matching redundancy.

Image Retrieval Image-to-Text Retrieval +3

Joint Contrastive Learning with Infinite Possibilities

1 code implementation NeurIPS 2020 Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

This paper explores useful modifications of the recent development in contrastive learning via novel probabilistic modeling.

Contrastive Learning

Learning to Localize Actions from Moments

1 code implementation ECCV 2020 Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.

Action Localization Transfer Learning

Monocular, One-stage, Regression of Multiple 3D People

1 code implementation ICCV 2021 Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei

Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map.

3D Multi-Person Mesh Recovery

Black Re-ID: A Head-shoulder Descriptor for the Challenging Problem of Person Re-Identification

no code implementations19 Aug 2020 Boqiang Xu, Lingxiao He, Xingyu Liao, Wu Liu, Zhenan Sun, Tao Mei

Given the input person image, the ensemble method would focus on the head-shoulder feature by assigning a larger weight if the individual insides the image is in black clothing.

Person Re-Identification

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

3 code implementations3 Aug 2020 Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei

In this paper, we compose a trilogy of exploring the basic and generic supervision in the sequence from spatial, spatiotemporal and sequential perspectives.

Action Recognition Contrastive Learning +3

Pre-training for Video Captioning Challenge 2020 Summary

no code implementations27 Jul 2020 Yingwei Pan, Jun Xu, Yehao Li, Ting Yao, Tao Mei

The Pre-training for Video Captioning Challenge 2020 Summary: results and challenge participants' technical reports.

Video Captioning

Edge-aware Graph Representation Learning and Reasoning for Face Parsing

1 code implementation ECCV 2020 Gusi Te, Yinglu Liu, Wei Hu, Hailin Shi, Tao Mei

Specifically, we encode a facial image onto a global graph representation where a collection of pixels ("regions") with similar features are projected to each vertex.

Face Parsing Graph Representation Learning

NPCFace: Negative-Positive Collaborative Training for Large-scale Face Recognition

no code implementations20 Jul 2020 Dan Zeng, Hailin Shi, Hang Du, Jun Wang, Zhen Lei, Tao Mei

However, the correlation between hard positive and hard negative is overlooked, and so is the relation between the margins in positive and negative logits.

Face Recognition

Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation

1 code implementation ECCV 2020 Haoran Wang, Tong Shen, Wei zhang, Ling-Yu Duan, Tao Mei

To fully exploit the supervision in the source domain, we propose a fine-grained adversarial learning strategy for class-level feature alignment while preserving the internal structure of semantics across domains.

Domain Adaptation Semantic Segmentation +1

Semi-Siamese Training for Shallow Face Learning

2 code implementations ECCV 2020 Hang Du, Hailin Shi, Yuchi Liu, Jun Wang, Zhen Lei, Dan Zeng, Tao Mei

Extensive experiments on various benchmarks of face recognition show the proposed method significantly improves the training, not only in shallow face learning, but also for conventional deep face data.

Face Recognition

Loss Function Search for Face Recognition

1 code implementation ICML 2020 Xiaobo Wang, Shuo Wang, Cheng Chi, Shifeng Zhang, Tao Mei

In face recognition, designing margin-based (e. g., angular, additive, additive angular margins) softmax loss functions plays an important role in learning discriminative features.

AutoML Face Recognition

Single Shot Video Object Detector

1 code implementation7 Jul 2020 Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Object Detection

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

no code implementations5 Jul 2020 Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, Tao Mei

In this work, we present Auto-captions on GIF, which is a new large-scale pre-training dataset for generic video understanding.

Question Answering Video Captioning +2

Transferring and Regularizing Prediction for Semantic Segmentation

no code implementations CVPR 2020 Yiheng Zhang, Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Dong Liu, Tao Mei

In the view of extremely expensive expert labeling, recent research has shown that the models trained on photo-realistic synthetic data (e. g., computer games) with computer-generated annotations can be adapted to real images.

Domain Adaptation Semantic Segmentation

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation

no code implementations CVPR 2020 Yingwei Pan, Ting Yao, Yehao Li, Chong-Wah Ngo, Tao Mei

A clustering branch is capitalized on to ensure that the learnt representation preserves such underlying structure by matching the estimated assignment distribution over clusters to the inherent cluster distribution for each target sample.

Unsupervised Domain Adaptation

Learning a Unified Sample Weighting Network for Object Detection

1 code implementation CVPR 2020 Qi Cai, Yingwei Pan, Yu Wang, Jingen Liu, Ting Yao, Tao Mei

To this end, we devise a general loss function to cover most region-based object detectors with various sampling strategies, and then based on it we propose a unified sample weighting network to predict a sample's task weights.

General Classification Object Detection

Learning the Compositional Visual Coherence for Complementary Recommendations

no code implementations8 Jun 2020 Zhi Li, Bo Wu, Qi Liu, Likang Wu, Hongke Zhao, Tao Mei

Towards this end, in this paper, we propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents.

FastReID: A Pytorch Toolbox for General Instance Re-identification

2 code implementations4 Jun 2020 Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, Tao Mei

General Instance Re-identification is a very important task in the computer vision, which can be widely used in many practical applications, such as person/vehicle re-identification, face recognition, wildlife protection, commodity tracing, and snapshop, etc.. To meet the increasing application demand for general instance re-identification, we present FastReID as a widely used software system in JD AI Research.

Face Recognition Image Retrieval +2

Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks

no code implementations13 May 2020 Ning Zhang, Jingen Liu, Ke Wang, Dan Zeng, Tao Mei

Inspired by the human "visual tracking" capability which leverages motion cues to distinguish the target from the background, we propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking, which successfully exploits both appearance and motion features for model update.

Visual Object Tracking Visual Tracking

VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification

no code implementations14 Apr 2020 Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, Tao Mei

This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain.

Representation Learning Vehicle Re-Identification

A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing

no code implementations Proceedings of the AAAI Conference on Artificial Intelligence 2020 Yinglu Liu, Hailin Shi, Hao Shen, Yue Si, Xiaobo Wang, Tao Mei

The dataset is publicly accessible to the community for boosting the advance of face parsing. 1 Second, a simple yet effective Boundary-Attention Semantic Segmentation (BASS) method is proposed for face parsing, which contains a three-branch network with elaborately developed loss functions to fully exploit the boundary information.

Face Parsing Image Generation +1

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

2 code implementations CVPR 2020 Mohan Zhou, Yalong Bai, Wei zhang, Tiejun Zhao, Tao Mei

Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category.

Fine-Grained Image Classification Image Recognition +5

Long Short-Term Relation Networks for Video Action Detection

no code implementations31 Mar 2020 Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei

It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.

Action Detection Region Proposal

X-Linear Attention Networks for Image Captioning

1 code implementation CVPR 2020 Yingwei Pan, Ting Yao, Yehao Li, Tao Mei

Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs.

Fine-Grained Visual Recognition Image Captioning +2

Adaptive Semantic-Visual Tree for Hierarchical Embeddings

no code implementations8 Mar 2020 Shuo Yang, Wei Yu, Ying Zheng, Hongxun Yao, Tao Mei

To solve this new problem, we propose a hierarchical adaptive semantic-visual tree (ASVT) to depict the architecture of merchandise categories, which evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously.

Image Retrieval

Down to the Last Detail: Virtual Try-on with Detail Carving

2 code implementations13 Dec 2019 Jiahang Wang, Wei zhang, Weizhong Liu, Tao Mei

However, existing methods can hardly preserve the details in clothing texture and facial identity (face, hair) while fitting novel clothes and poses onto a person.

Virtual Try-on

Zooming into Face Forensics: A Pixel-level Analysis

no code implementations12 Dec 2019 Jia Li, Tong Shen, Wei zhang, Hui Ren, Dan Zeng, Tao Mei

The stunning progress in face manipulation methods has made it possible to synthesize realistic fake face images, which poses potential threats to our society.

General Classification

Theme-Matters: Fashion Compatibility Learning via Theme Attention

no code implementations12 Dec 2019 Jui-Hsin Lai, Bo Wu, Xin Wang, Dan Zeng, Tao Mei, Jingen Liu

This model associates themes with the pairwise compatibility with attention, and thus compute the outfit-wise compatibility.

Fashion Compatibility Learning

Mis-classified Vector Guided Softmax Loss for Face Recognition

no code implementations26 Nov 2019 Xiaobo Wang, Shifeng Zhang, Shuo Wang, Tianyu Fu, Hailin Shi, Tao Mei

Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination.

Face Recognition

Scheduled Differentiable Architecture Search for Visual Recognition

no code implementations23 Sep 2019 Zhaofan Qiu, Ting Yao, Yiheng Zhang, Yongdong Zhang, Tao Mei

Moreover, we enlarge the search space of SDAS particularly for video recognition by devising several unique operations to encode spatio-temporal dynamics and demonstrate the impact in affecting the architecture search of SDAS.

Video Recognition

Deep Metric Learning with Density Adaptivity

no code implementations9 Sep 2019 Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

The problem of distance metric learning is mostly considered from the perspective of learning an embedding space, where the distances between pairs of examples are in correspondence with a similarity metric.

Metric Learning

Hierarchy Parsing for Image Captioning

no code implementations ICCV 2019 Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

It is always well believed that parsing an image into constituent visual patterns would be helpful for understanding and representing an image.

Image Captioning

Relationship-Aware Spatial Perception Fusion for Realistic Scene Layout Generation

no code implementations2 Sep 2019 Hongdong Zheng, Yalong Bai, Wei zhang, Tao Mei

In our framework, a spatial constraint module is designed to fit reasonable scaling and spatial layout of object pairs with considering relationship between them.

Image Generation

Mocycle-GAN: Unpaired Video-to-Video Translation

no code implementations26 Aug 2019 Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei

Unsupervised image-to-image translation is the task of translating an image from one domain to another in the absence of any paired training examples and tends to be more applicable to practical applications.

Motion Estimation Translation +1

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices

1 code implementation16 Aug 2019 Jianhao Zhang, Yingwei Pan, Ting Yao, He Zhao, Tao Mei

It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations.

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

no code implementations1 Aug 2019 Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei

A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure.

Image Paragraph Captioning

Regularizing Proxies with Multi-Adversarial Training for Unsupervised Domain-Adaptive Semantic Segmentation

1 code implementation29 Jul 2019 Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei

To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.

Semantic Segmentation Unsupervised Domain Adaptation

Hard-Aware Fashion Attribute Classification

no code implementations25 Jul 2019 Yun Ye, Yixin Li, Bo Wu, Wei zhang, Ling-Yu Duan, Tao Mei

For "hard" attributes with insufficient training data, Deact brings more stable synthetic samples for training and further improve the performance.

General Classification

Learning Spatio-Temporal Representation with Local and Global Diffusion

no code implementations CVPR 2019 Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei

Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.

Action Classification Action Detection +3

Group Re-Identification with Multi-grained Matching and Integration

no code implementations17 May 2019 Weiyao Lin, Yuxi Li, Hao Xiao, John See, Junni Zou, Hongkai Xiong, Jingdong Wang, Tao Mei

The task of re-identifying groups of people underdifferent camera views is an important yet less-studied problem. Group re-identification (Re-ID) is a very challenging task sinceit is not only adversely affected by common issues in traditionalsingle object Re-ID problems such as viewpoint and human posevariations, but it also suffers from changes in group layout andgroup membership.

A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

no code implementations13 May 2019 Yinglu Liu, Hailin Shi, Yue Si, Hao Shen, Xiaobo Wang, Tao Mei

Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks.

Face Alignment Face Detection +2

Predictive Ensemble Learning with Application to Scene Text Detection

no code implementations12 May 2019 Danlu Chen, Xu-Yao Zhang, Wei zhang, Yao Lu, Xiuli Li, Tao Mei

Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression.

Ensemble Learning General Classification +2

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

1 code implementation3 May 2019 Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei

Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations.

Video Captioning

Transferrable Prototypical Networks for Unsupervised Domain Adaptation

no code implementations CVPR 2019 Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei

Specifically, we present Transferrable Prototypical Networks (TPN) for adaptation such that the prototypes for each class in source and target domains are close in the embedding space and the score distributions predicted by prototypes separately on source and target data are similar.

Unsupervised Domain Adaptation

Everyone is a Cartoonist: Selfie Cartoonization with Attentive Adversarial Networks

no code implementations20 Apr 2019 Xinyu Li, Wei zhang, Tong Shen, Tao Mei

Selfie and cartoon are two popular artistic forms that are widely presented in our daily life.

Translation

Unsupervised Person Image Generation with Semantic Parsing Transformation

1 code implementation CVPR 2019 Sijie Song, Wei zhang, Jiaying Liu, Tao Mei

Firstly, a semantic generative network is proposed to transform between semantic parsing maps, in order to simplify the non-rigid deformation learning.

Image Generation Image Manipulation +1

VrR-VG: Refocusing Visually-Relevant Relationships

no code implementations ICCV 2019 Yuanzhi Liang, Yalong Bai, Wei zhang, Xueming Qian, Li Zhu, Tao Mei

Relationships encode the interactions among individual instances, and play a critical role in deep visual scene understanding.

Image Captioning Question Answering +3

Improved Selective Refinement Network for Face Detection

no code implementations20 Jan 2019 Shifeng Zhang, Rui Zhu, Xiaobo Wang, Hailin Shi, Tianyu Fu, Shuo Wang, Tao Mei, Stan Z. Li

With the availability of face detection benchmark WIDER FACE dataset, much of the progresses have been made by various algorithms in recent years.

Data Augmentation Face Detection +1

Multi-Granularity Reasoning for Social Relation Recognition from Images

no code implementations10 Jan 2019 Meng Zhang, Xinchen Liu, Wu Liu, Anfu Zhou, Huadong Ma, Tao Mei

To bridge the domain gap, we propose a Multi-Granularity Reasoning framework for social relation recognition from images.

Support Vector Guided Softmax Loss for Face Recognition

3 code implementations29 Dec 2018 Xiaobo Wang, Shuo Wang, Shifeng Zhang, Tianyu Fu, Hailin Shi, Tao Mei

Face recognition has witnessed significant progresses due to the advances of deep convolutional neural networks (CNNs), the central challenge of which, is feature discrimination.

Face Recognition

ScratchDet: Training Single-Shot Object Detectors from Scratch

1 code implementation CVPR 2019 Rui Zhu, Shifeng Zhang, Xiaobo Wang, Longyin Wen, Hailin Shi, Liefeng Bo, Tao Mei

Taking this advantage, we are able to explore various types of networks for object detection, without suffering from the poor convergence.

General Classification Object Detection

KTAN: Knowledge Transfer Adversarial Network

no code implementations18 Oct 2018 Peiye Liu, Wu Liu, Huadong Ma, Tao Mei, Mingoo Seok

To transfer the knowledge of intermediate representations, we set high-level teacher feature maps as a target, toward which the student feature maps are trained.

Knowledge Distillation Object Detection +1

Exploring Visual Relationship for Image Captioning

no code implementations ECCV 2018 Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections.

Image Captioning

Deep Attention Neural Tensor Network for Visual Question Answering

no code implementations ECCV 2018 Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei

First, we model one of the pairwise interaction (e. g., image and question) by bilinear features, which is further encoded with the third dimension (e. g., answer) to be a triplet by bilinear tensor product.

Deep Attention Question Answering +1

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

no code implementations ECCV 2018 Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei

The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.

Action Detection Region Proposal

Learning from History and Present: Next-item Recommendation via Discriminatively Exploiting User Behaviors

no code implementations3 Aug 2018 Zhi Li, Hongke Zhao, Qi Liu, Zhenya Huang, Tao Mei, Enhong Chen

In this paper, we propose a novel Behavior-Intensive Neural Network (BINN) for next-item recommendation by incorporating both users' historical stable preferences and present consumption motivations.

Session-Based Recommendations

DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks

no code implementations CVPR 2018 Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.

Data Augmentation Deep Attention +2

Subspace Clustering by Block Diagonal Representation

no code implementations23 May 2018 Canyi Lu, Jiashi Feng, Zhouchen Lin, Tao Mei, Shuicheng Yan

Second, we observe that many existing methods approximate the block diagonal representation matrix by using different structure priors, e. g., sparsity and low-rankness, which are indirect.

To Create What You Tell: Generating Videos from Captions

no code implementations23 Apr 2018 Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, Tao Mei

In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions.

Deep Semantic Hashing with Generative Adversarial Networks

no code implementations23 Apr 2018 Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei

Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream.

General Classification Image Retrieval

Jointly Localizing and Describing Events for Dense Video Captioning

no code implementations CVPR 2018 Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

A valid question is how to temporally localize and then describe events, which is known as "dense video captioning."

Dense Video Captioning

Fully Convolutional Adaptation Networks for Semantic Segmentation

no code implementations CVPR 2018 Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets.

Domain Adaptation Semantic Segmentation

Memory Matching Networks for One-Shot Image Recognition

no code implementations CVPR 2018 Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, Tao Mei

In this paper, we introduce the new ideas of augmenting Convolutional Neural Networks (CNNs) with Memory and learning to learn the network parameters for the unlabelled images on the fly in one-shot learning.

One-Shot Learning

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression

no code implementations19 Apr 2018 Yitian Yuan, Tao Mei, Wenwu Zhu

Then, a multi-modal co-attention mechanism is introduced to generate not only video attention which reflects the global video structure, but also sentence attention which highlights the crucial details for temporal localization.

Temporal Localization

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials)

no code implementations CVPR 2018 Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.

Data Augmentation Deep Attention +1

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

no code implementations EMNLP 2018 Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo

Most existing approaches adopt the pipeline of representing an image via pre-trained CNNs, and then using the uninterpretable CNN features in conjunction with the question to predict the answer.

Image Captioning Question Answering +1

Time Matters: Multi-scale Temporalization of Social Media Popularity

no code implementations12 Dec 2017 Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Tao Mei

We evaluate our approach on two large-scale Flickr image datasets with over 1. 8 million photos in total, for the task of popularity prediction.

Social Media Popularity Prediction

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks

1 code implementation12 Dec 2017 Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei

With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space.

Social Media Popularity Prediction

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

2 code implementations ICCV 2017 Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.

Action Recognition

Learning Social Image Embedding with Deep Multimodal Attention Networks

no code implementations18 Oct 2017 Feiran Huang, Xiao-Ming Zhang, Zhoujun Li, Tao Mei, Yueying He, Zhonghua Zhao

Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search.

General Classification Link Prediction +1

Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition

no code implementations ICCV 2017 Heliang Zheng, Jianlong Fu, Tao Mei, Jiebo Luo

Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way.

Fine-Grained Image Recognition General Classification +1

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

no code implementations ICCV 2017 Ryota Hinami, Tao Mei, Shin'ichi Satoh

Although convolutional neural networks (CNNs) have achieved promising results in learning such concepts, it remains an open question as to how to effectively use CNNs for abnormal event detection, mainly due to the environment-dependent nature of the anomaly detection.

Anomaly Detection Event Detection

Automatic Dataset Augmentation

no code implementations28 Aug 2017 Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao

Large scale image dataset and deep convolutional neural network (DCNN) are two primary driving forces for the rapid progress made in generic object recognition tasks in recent years.

Object Recognition

Multi-Level Attention Networks for Visual Question Answering

no code implementations CVPR 2017 Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui

To solve the challenges, we propose a multi-level attention network for visual question answering that can simultaneously reduce the semantic gap by semantic attention and benefit fine-grained spatial inference by visual attention.

Question Answering Visual Question Answering

Deep Quantization: Encoding Convolutional Activations with Deep Generative Model

no code implementations CVPR 2017 Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner.

Action Recognition Fine-Grained Image Classification +2

Video Captioning with Transferred Semantic Attributes

no code implementations CVPR 2017 Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community.

Video Captioning

Boosting Image Captioning with Attributes

no code implementations ICCV 2017 Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.

Image Captioning

Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network

no code implementations2 Jun 2016 Yu Liu, Jianlong Fu, Tao Mei, Chang Wen Chen

Second, by using sGRU as basic units, the BMRNN is trained to align the local storylines into the global sequential timeline.

Video Captioning Visual Storytelling

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

no code implementations CVPR 2016 Jun Xu, Tao Mei, Ting Yao, Yong Rui

In this paper we present MSR-VTT (standing for "ABC-Video to Text") which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text.

Image Captioning Video Description +1

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

no code implementations CVPR 2016 Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei

The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.

Action Recognition Event Detection

Highlight Detection With Pairwise Deep Ranking for First-Person Video Summarization

no code implementations CVPR 2016 Ting Yao, Tao Mei, Yong Rui

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos.

Video Summarization

Learning Query and Image Similarities With Ranking Canonical Correlation Analysis

no code implementations ICCV 2015 Ting Yao, Tao Mei, Chong-Wah Ngo

One of the fundamental problems in image search is to learn the ranking functions, i. e., similarity between the query and image.

Image Retrieval

Relaxing From Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging

no code implementations ICCV 2015 Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, Yong Rui

The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings.

Tree-based Visualization and Optimization for Image Collection

no code implementations17 Jul 2015 Xintong Han, Chongyang Zhang, Weiyao Lin, Mingliang Xu, Bin Sheng, Tao Mei

The visualization of an image collection is the process of displaying a collection of images on a screen under some specific layout requirements.

Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition

no code implementations CVPR 2015 Ting Yao, Yingwei Pan, Chong-Wah Ngo, Houqiang Li, Tao Mei

In many real-world applications, we are often facing the problem of cross domain learning, i. e., to borrow the labeled data or transfer the already learnt knowledge from a source domain to a target domain.

Domain Adaptation Object Recognition

Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection

no code implementations CVPR 2015 Wu Liu, Tao Mei, Yongdong Zhang, Cherry Che, Jiebo Luo

Given the tremendous growth of online videos, video thumbnail, as the common visualization form of video content, is becoming increasingly important to influence user's browsing and searching experience.

Multi-Task Learning

Jointly Modeling Embedding and Translation to Bridge Video and Language

no code implementations CVPR 2016 Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui

Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Translation

Large-scale Online Feature Selection for Ultra-high Dimensional Sparse Data

no code implementations27 Sep 2014 Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu

However, unlike many second-order learning methods that often suffer from extra high computational cost, we devise a novel smart algorithm for second-order online feature selection using a MaxHeap-based approach, which is not only more effective than the existing first-order approaches, but also significantly more efficient and scalable for large-scale feature selection with ultra-high dimensional sparse data, as validated from our extensive experiments.

Cannot find the paper you are looking for? You can Submit a new open access paper.