Search Results for author: Ping Luo

Found 126 papers, 61 papers with code

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

1 code implementation NeurIPS 2021 Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification Object Detection +2

Compressed Video Contrastive Learning

no code implementations NeurIPS 2021 Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa.

Contrastive Learning Representation Learning

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

1 code implementation3 Nov 2021 Zhe Chen, Wenhai Wang, Enze Xie, Zhibo Yang, Tong Lu, Ping Luo

We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).

Image Classification Scene Text +1

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

no code implementations NeurIPS 2021 Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.

Visual Reasoning

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

4 code implementations arXiv 2021 Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos.

 Ranked #1 on Multi-Object Tracking on MOT17 (using extra training data)

Multi-Object Tracking

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

1 code implementation11 Oct 2021 Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification Object Detection +2

Objects in Semantic Topology

no code implementations6 Oct 2021 Shuo Yang, Peize Sun, Yi Jiang, Xiaobo Xia, Ruiheng Zhang, Zehuan Yuan, Changhu Wang, Ping Luo, Min Xu

A more realistic object detection paradigm, Open-World Object Detection, has arisen increasing research interests in the community recently.

Incremental Learning Language Modelling +1

Scale-Invariant Teaching for Semi-Supervised Object Detection

no code implementations29 Sep 2021 Qiushan Guo, Yizhou Yu, Ping Luo

Furthermore, the limited annotations in semi-supervised learning scale up the challenges: large variance of object sizes and class imbalance (i. e., the extreme ratio between background and object), hindering the performance of prior arts.

Towards High-Quality Temporal Action Detection with Sparse Proposals

1 code implementation18 Sep 2021 Jiannan Wu, Peize Sun, Shoufa Chen, Jiewen Yang, Zihao Qi, Lan Ma, Ping Luo

Towards high-quality temporal action detection, we introduce Sparse Proposals to interact with the hierarchical features.

Action Detection Video Understanding

Panoptic SegFormer

no code implementations8 Sep 2021 Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Tong Lu, Ping Luo

We present Panoptic SegFormer, a general framework for end-to-end panoptic segmentation with Transformers.

Panoptic Segmentation

Adversarial Robustness for Unsupervised Domain Adaptation

no code implementations ICCV 2021 Muhammad Awais, Fengwei Zhou, Hang Xu, Lanqing Hong, Ping Luo, Sung-Ho Bae, Zhenguo Li

Extensive Unsupervised Domain Adaptation (UDA) studies have shown great success in practice by learning transferable representations across a labeled source domain and an unlabeled target domain with deep models.

Adversarial Robustness Unsupervised Domain Adaptation

End-to-End Dense Video Captioning with Parallel Decoding

1 code implementation ICCV 2021 Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo

Dense video captioning aims to generate multiple associated captions with their temporal locations from the video.

Dense Video Captioning

CycleMLP: A MLP-like Architecture for Dense Prediction

5 code implementations21 Jul 2021 Shoufa Chen, Enze Xie, Chongjian Ge, Ding Liang, Ping Luo

We build a family of models that surpass existing MLPs and achieve a comparable accuracy (83. 2%) on ImageNet-1K classification compared to the state-of-the-art Transformer such as Swin Transformer (83. 3%) but using fewer parameters and FLOPs.

Image Classification Instance Segmentation +2

Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation

1 code implementation8 Jul 2021 Lingyun Wu, Zhiqiang Hu, Yuanfeng Ji, Ping Luo, Shaoting Zhang

For example, STFT improves the still image baseline FCOS by 10. 6% and 20. 6% on the comprehensive F1-score of the polyp localization task in CVC-Clinic and ASUMayo datasets, respectively, and outperforms the state-of-the-art video-based method by 3. 6% and 8. 0%, respectively.

Multi-Compound Transformer for Accurate Biomedical Image Segmentation

1 code implementation28 Jun 2021 Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting Zhang, Ping Luo

The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.

Image Classification Semantic correspondence +1

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

1 code implementation CVPR 2021 Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo

Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.

Image Classification Neural Architecture Search +2

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

8 code implementations NeurIPS 2021 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.

Semantic Segmentation

Extracting Variable-Depth Logical Document Hierarchy from Long Documents: Method, Evaluation, and Application

no code implementations14 May 2021 Rongyu Cao, Yixuan Cao, Ganbin Zhou, Ping Luo

In this paper, we study the problem of extracting variable-depth "logical document hierarchy" from long documents, namely organizing the recognized "physical document objects" into hierarchical structures.

Passage Retrieval

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

1 code implementation CVPR 2021 Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo

However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.

Knowledge Distillation Pose Estimation

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening

no code implementations13 May 2021 Wenqi Shao, Hang Yu, Zhaoyang Zhang, Hang Xu, Zhenguo Li, Ping Luo

To address this problem, we develop a probability-based pruning algorithm, called batch whitening channel pruning (BWCP), which can stochastically discard unimportant channels by modeling the probability of a channel being activated.

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

1 code implementation5 May 2021 Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo

Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.

Ranked #43 on Instance Segmentation on COCO test-dev (using extra training data)

Cell Segmentation Instance Segmentation +2

Going Deeper Into Face Detection: A Survey

no code implementations27 Mar 2021 Shervin Minaee, Ping Luo, Zhe Lin, Kevin Bowyer

In this work, we provide a detailed overview of some of the most representative deep learning based face detection methods by grouping them into a few major categories, and present their core architectural designs and accuracies on popular benchmarks.

Face Detection Image Classification

Learning Versatile Neural Architectures by Propagating Network Codes

1 code implementation24 Mar 2021 Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo

This work explores how to design a single neural network that is capable of adapting to multiple heterogeneous tasks of computer vision, such as image segmentation, 3D detection, and video recognition.

Neural Architecture Search Object Classification +2

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

1 code implementation22 Mar 2021 Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo

1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.

Style Transfer

Disentangled Cycle Consistency for Highly-realistic Virtual Try-On

1 code implementation CVPR 2021 Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo

To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.

Virtual Try-on

Unsupervised Pretraining for Object Detection by Patch Reidentification

no code implementations8 Mar 2021 Jian Ding, Enze Xie, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia

Secondly, patch Re-ID is performed in Deeply Unsupervised manner to learn multi-level representations, appealing to object detection.

Object Detection Unsupervised Representation Learning

Parser-Free Virtual Try-on via Distilling Appearance Flows

1 code implementation CVPR 2021 Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo

A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.

Human Parsing Knowledge Distillation +1

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

3 code implementations ICCV 2021 Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.

Image Classification Instance Segmentation +2

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

1 code implementation15 Feb 2021 Chaofan Tao, Rui Lin, Quan Chen, Zhaoyang Zhang, Ping Luo, Ngai Wong

Prior arts often discretize the network weights by carefully tuning hyper-parameters of quantization (e. g. non-uniform stepsize and layer-wise bitwidths), which are complicated and sub-optimal because the full-precision and low-precision models have a large discrepancy.

Neural Network Compression Quantization

DetCo: Unsupervised Contrastive Learning for Object Detection

2 code implementations ICCV 2021 Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo

Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.

Contrastive Learning Image Classification +2

Segmenting Transparent Object in the Wild with Transformer

2 code implementations21 Jan 2021 Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.

Semantic Segmentation

Rethinking the Pruning Criteria for Convolutional Neural Network

no code implementations NeurIPS 2021 Zhongzhan Huang, Xinjiang Wang, Ping Luo

Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), and various pruning criteria have been proposed to remove the redundant filters of CNNs.

Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames

no code implementations ICCV 2021 Wei Shang, Dongwei Ren, Dongqing Zou, Jimmy S. Ren, Ping Luo, WangMeng Zuo

EFM can also be easily incorporated into existing deblurring networks, making event-driven deblurring task benefit from state-of-the-art deblurring methods.

Deblurring

Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

no code implementations1 Jan 2021 Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo

With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.

Representation Learning

TransTrack: Multiple Object Tracking with Transformer

3 code implementations31 Dec 2020 Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo

In this work, we propose TransTrack, a simple but efficient scheme to solve the multiple object tracking problems.

Multiple Object Tracking Object Detection

What Makes for End-to-End Object Detection?

1 code implementation10 Dec 2020 Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo

We identify that classification cost in matching cost is the main ingredient: (1) previous detectors only consider location cost, (2) by additionally introducing classification cost, previous detectors immediately produce one-to-one prediction during inference.

Classification General Classification +1

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training

1 code implementation26 Nov 2020 Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Guan Pang, Zhen Li, Hong Zhou, Ping Luo

Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging.

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

4 code implementations CVPR 2021 Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei LI, Zehuan Yuan, Changhu Wang, Ping Luo

In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location.

Object Detection Object Recognition

Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs

1 code implementation ICLR 2021 Xingang Pan, Bo Dai, Ziwei Liu, Chen Change Loy, Ping Luo

Through our investigation, we found that such a pre-trained GAN indeed contains rich 3D knowledge and thus can be used to recover 3D shape from a single 2D image in an unsupervised manner.

3D Shape Reconstruction

UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation

no code implementations16 Sep 2020 Yuanfeng Ji, Ruimao Zhang, Zhen Li, Jiamin Ren, Shaoting Zhang, Ping Luo

Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network.

Neural Architecture Search Volumetric Medical Image Segmentation

RelativeNAS: Relative Neural Architecture Search via Slow-Fast Learning

2 code implementations14 Sep 2020 Hao Tan, Ran Cheng, Shihua Huang, Cheng He, Changxiao Qiu, Fan Yang, Ping Luo

Despite the remarkable successes of Convolutional Neural Networks (CNNs) in computer vision, it is time-consuming and error-prone to manually design a CNN.

Keypoint Detection Neural Architecture Search +2

Compensation Tracker: Reprocessing for Lost Object

no code implementations27 Aug 2020 Zhibo Zou, Jun-Jie Huang, Ping Luo

Although the detection-based tracking framework can achieve good results, it is very dependent on the performance of the detector.

Motion Compensation Multi-Object Tracking

Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction

no code implementations ECCV 2020 Chaofan Tao, Qinhong Jiang, Lixin Duan, Ping Luo

Existing work addressed this challenge by either learning social spatial interactions represented by the positions of a group of pedestrians, while ignoring their temporal coherence (\textit{i. e.} dependencies between different long trajectories), or by understanding the complicated scene layout (\textit{e. g.} scene segmentation) to ensure safe navigation.

motion prediction Trajectory Prediction

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations ECCV 2020 Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Scene Text +1

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations ECCV 2020 Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

Graph Clustering Human Detection +1

Whole-Body Human Pose Estimation in the Wild

2 code implementations ECCV 2020 Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo

This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.

Facial Landmark Detection Hand Pose Estimation +1

3D Human Mesh Regression with Dense Correspondence

1 code implementation CVPR 2020 Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang

This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).

Human robot interaction

Learning a Reinforced Agent for Flexible Exposure Bracketing Selection

1 code implementation CVPR 2020 Zhouxia Wang, Jiawei Zhang, Mude Lin, Jiong Wang, Ping Luo, Jimmy Ren

Automatically selecting exposure bracketing (images exposed differently) is important to obtain a high dynamic range image by using multi-exposure fusion.

Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning

no code implementations24 Apr 2020 Zhongzhan Huang, Wenqi Shao, Xinjiang Wang, Liang Lin, Ping Luo

Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters.

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

1 code implementation21 Apr 2020 Wenjie Li, Zhaoyang Zhang, Xinjiang Wang, Ping Luo

Although adaptive optimization algorithms such as Adam show fast convergence in many machine learning tasks, this paper identifies a problem of Adam by analyzing its performance in a simple non-convex synthetic problem, showing that Adam's fast convergence would possibly lead the algorithm to local minimums.

Segmenting Transparent Objects in the Wild

1 code implementation ECCV 2020 Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Semantic Segmentation

Domain-Adaptive Few-Shot Learning

1 code implementation19 Mar 2020 An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo

Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples.

Domain Adaptation Few-Shot Learning

Exemplar Normalization for Learning Deep Representation

no code implementations CVPR 2020 Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo

This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.

Semantic Segmentation

Towards Photo-Realistic Virtual Try-On by Adaptively Generating$\leftrightarrow$Preserving Image Content

1 code implementation12 Mar 2020 Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, WangMeng Zuo, Ping Luo

First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on.

Semantic Segmentation Virtual Try-on

Channel Equilibrium Networks for Learning Deep Representation

1 code implementation ICML 2020 Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.

How Does BN Increase Collapsed Neural Network Filters?

no code implementations30 Jan 2020 Sheng Zhou, Xinjiang Wang, Ping Luo, Litong Feng, Wenjie Li, Wei zhang

This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result.

Object Detection

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

no code implementations28 Nov 2019 Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo

Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.

Optical Flow Estimation Semantic Segmentation +2

Vision-Infused Deep Audio Inpainting

no code implementations ICCV 2019 Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang

Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts.

Audio inpainting Image Inpainting

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations CVPR 2020 Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Instance Segmentation Object Detection +1

Channel Equilibrium Networks

no code implementations25 Sep 2019 Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

However, over-sparse CNNs have many collapsed channels (i. e. many channels with undesired zero values), impeding their learning ability.

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation16 Sep 2019 Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Scene Text Recognition +1

PDA: Progressive Data Augmentation for General Robustness of Deep Neural Networks

no code implementations11 Sep 2019 Hang Yu, Aishan Liu, Xianglong Liu, Gengchao Li, Ping Luo, Ran Cheng, Jichen Yang, Chongzhi Zhang

In other words, DNNs trained with PDA are able to obtain more robustness against both adversarial attacks as well as common corruptions than the recent state-of-the-art methods.

Data Augmentation

Scale Calibrated Training: Improving Generalization of Deep Networks via Scale-Specific Normalization

no code implementations31 Aug 2019 Zhuoran Yu, Aojun Zhou, Yukun Ma, Yudian Li, Xiaohan Zhang, Ping Luo

Experiment results show that SCT improves accuracy of single Resnet-50 on ImageNet by 1. 7% and 11. 5% accuracy when testing on image sizes of 224 and 128 respectively.

Data Augmentation Image Classification +1

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

no code implementations ICCV 2019 Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang

To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.

Image Retrieval

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks

no code implementations ICCV 2019 Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo

ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once

no code implementations ICCV 2019 Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang

Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.

Classification General Classification

Deep Self-Learning From Noisy Labels

no code implementations ICCV 2019 Jiangfan Han, Ping Luo, Xiaogang Wang

Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision.

Learning with noisy labels

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

6 code implementations CVPR 2020 Cheng-Han Lee, Ziwei Liu, Lingyun Wu, Ping Luo

To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation.

Image Manipulation

Switchable Normalization for Learning-to-Normalize Deep Representation

no code implementations22 Jul 2019 Ping Luo, Ruimao Zhang, Jiamin Ren, Zhanglin Peng, Jingyu Li

Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer?

Atom Responding Machine for Dialog Generation

no code implementations14 May 2019 Ganbin Zhou, Ping Luo, Jingwu Chen, Fen Lin, Leyu Lin, Qing He

To enrich the generated responses, ARM introduces a large number of molecule-mechanisms as various responding styles, which are conducted by taking different combinations from a few atom-mechanisms.

Switchable Whitening for Deep Representation Learning

1 code implementation ICCV 2019 Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo

Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods.

Domain Adaptation Image Classification +3

SSN: Learning Sparse Switchable Normalization via SparsestMax

1 code implementation CVPR 2019 Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo

Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.

FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis

no code implementations4 Dec 2018 Yujun Shen, Bolei Zhou, Ping Luo, Xiaoou Tang

In the second stage, they compete in the image domain to render photo-realistic images that contain high diversity but preserve identity.

Face Generation

Kalman Normalization: Normalizing Internal Representations Across Network Layers

no code implementations NeurIPS 2018 Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin

In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches.

Object Detection

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?

no code implementations19 Nov 2018 Ping Luo, Zhanglin Peng, Jiamin Ren, Ruimao Zhang

Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.

Towards Understanding Regularization in Batch Normalization

1 code implementation ICLR 2019 Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng

Batch Normalization (BN) improves both convergence and generalization in training neural networks.

Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents

no code implementations22 Aug 2018 Ganbin Zhou, Rongyu Cao, Xiang Ao, Ping Luo, Fen Lin, Leyu Lin, Qing He

Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains.

Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos

no code implementations15 Aug 2018 Zhaoyang Zhang, Zhanghui Kuang, Ping Luo, Litong Feng, Wei zhang

Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies.

Action Recognition Action Recognition In Videos +1

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

13 code implementations ECCV 2018 Xingang Pan, Ping Luo, Jianping Shi, Xiaoou Tang

IBN-Net carefully integrates Instance Normalization (IN) and Batch Normalization (BN) as building blocks, and can be wrapped into many advanced deep networks to improve their performances.

Domain Adaptation

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

1 code implementation20 Jul 2018 Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.

Lip Reading Talking Face Generation +1

SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

no code implementations16 Jul 2018 Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang

To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).

Video-Based Person Re-Identification

Differentiable Learning-to-Normalize via Switchable Normalization

3 code implementations ICLR 2019 Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li

We hope SN will help ease the usage and understand the normalization techniques in deep learning.

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis

no code implementations CVPR 2018 Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang

Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.

Face Generation

Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches

no code implementations9 Feb 2018 Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin

As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer.

Image Classification

Spatial As Deep: Spatial CNN for Traffic Scene Understanding

6 code implementations17 Dec 2017 Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang

Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored.

Ranked #6 on Lane Detection on TuSimple (using extra training data)

Lane Detection Scene Understanding

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

no code implementations2 Dec 2017 Xiaohang Zhan, Ziwei Liu, Ping Luo, Xiaoou Tang, Chen Change Loy

The key of this new form of learning is to design a proxy task (e. g. image colorization), from which a discriminative loss can be formulated on unlabeled data.

Colorization Fine-tuning +1

Deep Dual Learning for Semantic Image Segmentation

no code implementations ICCV 2017 Ping Luo, Guangrun Wang, Liang Lin, Xiaogang Wang

The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance.

Semantic Segmentation

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

2 code implementations7 Aug 2017 Sijie Yan, Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang

This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test.

Learning Deep Architectures via Generalized Whitened Neural Networks

no code implementations ICML 2017 Ping Luo

Whitened Neural Network (WNN) is a recent advanced deep architecture, which improves convergence and generalization of canonical neural networks by whitening their internal hidden representation.

Video Object Segmentation with Re-identification

3 code implementations1 Aug 2017 Xiaoxiao Li, Yuankai Qi, Zhe Wang, Kai Chen, Ziwei Liu, Jianping Shi, Ping Luo, Xiaoou Tang, Chen Change Loy

Specifically, our Video Object Segmentation with Re-identification (VS-ReID) model includes a mask propagation module and a ReID module.

Semantic Segmentation Video Object Segmentation +2

Learning Object Interactions and Descriptions for Semantic Image Segmentation

no code implementations CVPR 2017 Guangrun Wang, Ping Luo, Liang Lin, Xiaogang Wang

This work significantly increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset.

Image Captioning Semantic Segmentation

Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation

no code implementations30 Apr 2017 Ganbin Zhou, Ping Luo, Rongyu Cao, Yijun Xiao, Fen Lin, Bo Chen, Qing He

Then, with a proposed tree-structured search method, the model is able to generate the most probable responses in the form of dependency trees, which are finally flattened into sequences as the system output.

Faceness-Net: Face Detection through Deep Facial Part Responses

no code implementations29 Jan 2017 Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang

We propose a deep convolutional neural network (CNN) for face detection leveraging on facial attributes based supervision.

Face Detection

From Facial Expression Recognition to Interpersonal Relation Prediction

no code implementations21 Sep 2016 Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang

Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data.

Facial Expression Recognition

Fashion Landmark Detection in the Wild

4 code implementations10 Aug 2016 Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, Xiaoou Tang

Fashion landmark is also compared to clothing bounding boxes and human joints in two applications, fashion attribute prediction and clothes retrieval, showing that fashion landmark is a more discriminative representation to understand fashion images.

Pose Estimation

DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations

no code implementations CVPR 2016 Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang

To demonstrate the advantages of DeepFashion, we propose a new deep model, namely FashionNet, which learns clothing features by jointly predicting clothing attributes and landmarks.

Deep Learning Strong Parts for Pedestrian Detection

no code implementations ICCV 2015 Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang

Third, each part detector in DeepParts is a strong detector that can detect pedestrian by observing only a part of a proposal.

Occlusion Handling Pedestrian Detection

From Facial Parts Responses to Face Detection: A Deep Learning Approach

2 code implementations ICCV 2015 Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang

In this paper, we propose a novel deep convolutional network (DCN) that achieves outstanding performance on FDDB, PASCAL Face, and AFW.

Face Detection

Learning Social Relation Traits from Face Images

no code implementations ICCV 2015 Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang

Social relation defines the association, e. g, warm, friendliness, and dominance, between two or more people.

Semantic Image Segmentation via Deep Parsing Network

no code implementations ICCV 2015 Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang

This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts.

Semantic Segmentation

Clothing Co-Parsing by Joint Image Segmentation and Labeling

no code implementations CVPR 2014 Wei Yang, Ping Luo, Liang Lin

This paper aims at developing an integrated system of clothing co-parsing, in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations.

Semantic Segmentation

Learning to Recognize Pedestrian Attribute

no code implementations5 Jan 2015 Yubin Deng, Ping Luo, Chen Change Loy, Xiaoou Tang

Learning to recognize pedestrian attributes at far distance is a challenging problem in visual surveillance since face and body close-shots are hardly available; instead, only far-view image frames of pedestrian are given.

Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations

no code implementations NeurIPS 2014 Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.

Face Recognition

Pedestrian Detection aided by Deep Learning Semantic Tasks

no code implementations CVPR 2015 Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang

Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources.

Pedestrian Detection Scene Segmentation

Deep Learning Face Attributes in the Wild

1 code implementation ICCV 2015 Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang

LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction.

Fine-tuning

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations11 Sep 2014 Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Object Detection

Deep Learning Multi-View Representation for Face Recognition

no code implementations26 Jun 2014 Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.

Face Recognition

Recover Canonical-View Faces in the Wild with Deep Neural Networks

no code implementations14 Apr 2014 Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications.

Face Reconstruction Face Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.