Search Results for author: Liang Lin

Found 196 papers, 62 papers with code

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

1 code implementation22 Aug 2021 Jiefeng Peng, Jiqi Zhang, Changlin Li, Guangrun Wang, Xiaodan Liang, Liang Lin

We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.

Neural Architecture Search

Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning

no code implementations12 Aug 2021 Junkai Huang, Chaowei Fang, Weikai Chen, Zhenhua Chai, Xiaolin Wei, Pengxu Wei, Liang Lin, Guanbin Li

Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations9 Aug 2021 Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation23 Jul 2021 Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Vision and Language Navigation Vision-Language Navigation

Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks

no code implementations ACL 2021 Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin

Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.

Prototypical Graph Contrastive Learning

no code implementations17 Jun 2021 Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Contrastive Learning Unsupervised Representation Learning

Towards Quantifiable Dialogue Coherence Evaluation

1 code implementation ACL 2021 Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang

To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards.

Coherence Evaluation Knowledge Distillation

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

1 code implementation30 May 2021 Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin

Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 5, 010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems.

Question Answering

Towards Solving Inefficiency of Self-supervised Representation Learning

1 code implementation18 Apr 2021 Guangrun Wang, Keze Wang, Guangcong Wang, Philip H. S. Torr, Liang Lin

In this paper, we discover two contradictory phenomena in contrastive learning that we call under-clustering and over-clustering problems, which are major obstacles to learning efficiency.

Contrastive Learning Representation Learning +3

Joint Learning of Neural Transfer and Architecture Adaptation for Image Recognition

no code implementations31 Mar 2021 Guangrun Wang, Liang Lin, Rongcong Chen, Guangcong Wang, Jiqi Zhang

In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness, compared to the existing image recognition pipeline that only tunes the weights regardless of the architecture.

Age Estimation Image Classification +4

Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer

2 code implementations26 Jan 2021 Liang Lin, Yiming Gao, Ke Gong, Meng Wang, Xiaodan Liang

Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e. g., sharing discrepant label granularity) without extensive re-training.

Graph Representation Learning Human Parsing +2

Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

no code implementations9 Jan 2021 Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin

Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings.

Temporal Contrastive Graph Learning for Video Action Recognition and Retrieval

no code implementations4 Jan 2021 Yang Liu, Keze Wang, Haoyuan Lan, Liang Lin

To model multi-scale temporal dependencies, our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i. e., the intra-/inter- snippet temporal contrastive graphs.

Action Recognition Contrastive Learning +3

Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations1 Jan 2021 Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

Towards a Reliable and Robust Dialogue System for Medical Automatic Diagnosis

no code implementations1 Jan 2021 Junfan Lin, Lin Xu, Ziliang Chen, Liang Lin

To this end, we propose a novel DSMAD agent, INS-DS (Introspective Diagnosis System) comprising of two separate yet cooperative modules, i. e., an inquiry module for proposing symptom-inquiries and an introspective module for deciding when to inform a disease.

Decision Making

CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy Temperature

no code implementations1 Jan 2021 Junfan Lin, Changxin Huang, Xiaodan Liang, Liang Lin

The curiosity is added to the target entropy to increase the entropy temperature for unfamiliar states and decrease the target entropy for familiar states.

Adversarial Training using Contrastive Divergence

no code implementations1 Jan 2021 Hongjun Wang, Guanbin Li, Liang Lin

To protect the security of machine learning models against adversarial examples, adversarial training becomes the most popular and powerful strategy against various adversarial attacks by injecting adversarial examples into training data.

AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

1 code implementation29 Dec 2020 Tao Pu, Tianshui Chen, Yuan Xie, Hefeng Wu, Liang Lin

In this work, we explore the correlations among the action units and facial expressions, and devise an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition.

Facial Expression Recognition Representation Learning

REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

no code implementations24 Dec 2020 Yinya Huang, Meng Fang, Xunlin Zhan, Qingxing Cao, Xiaodan Liang, Liang Lin

It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance.

Question Answering

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

1 code implementation22 Dec 2020 Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin

Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.

Dialogue Generation Meta-Learning

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

no code implementations14 Dec 2020 Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin

Specifically, we generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs to disentangle the knowledge from other biases.

Question Answering Visual Question Answering

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

1 code implementation30 Nov 2020 Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency.

Continuous Control

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

2 code implementations NeurIPS 2020 Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm.

Instance Segmentation Panoptic Segmentation +1

A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning

no code implementations15 Oct 2020 Hongjun Wang, Guanbin Li, Xiaobai Liu, Liang Lin

Although deep convolutional neural networks (CNNs) have demonstrated remarkable performance on multiple computer vision tasks, researches on adversarial learning have shown that deep models are vulnerable to adversarial examples, which are crafted by adding visually imperceptible perturbations to the input images.

Adversarial Attack

Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems

1 code implementation EMNLP 2020 Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, Liang Lin

A practical automatic textual math word problems (MWPs) solver should be able to solve various textual MWPs while most existing works only focused on one-unknown linear MWPs.

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

1 code implementation EMNLP 2020 Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang

Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.

Dialogue Evaluation

Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

no code implementations20 Sep 2020 Tianshui Chen, Liang Lin, Riquan Chen, Xiaolu Hui, Hefeng Wu

The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples.

Few-Shot Learning

Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos

no code implementations18 Sep 2020 Jie Wu, Guanbin Li, Xiaoguang Han, Liang Lin

Temporal grounding of natural language in untrimmed videos is a fundamental yet challenging multimedia task facilitating cross-media visual content retrieval.

Temporal Localization

Online Alternate Generator against Adversarial Attacks

no code implementations17 Sep 2020 Haofeng Li, Yirui Zeng, Guanbin Li, Liang Lin, Yizhou Yu

The field of computer vision has witnessed phenomenal progress in recent years partially due to the development of deep convolutional neural networks.

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition

1 code implementation1 Sep 2020 Yang Liu, Keze Wang, Guanbin Li, Liang Lin

In this paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors.

Action Recognition Image Generation +2

Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from Cross View and Each View

no code implementations23 Aug 2020 Junpeng Tan, Yukai Shi, Zhijing Yang, Caizhen Wen, Liang Lin

To ensure that we achieve effective sparse representation and clustering performance on the original data matrix, adaptive graph regularization and unsupervised clustering constraints are also incorporated in the proposed model to preserve the internal structural features of the data.

Component Divide-and-Conquer for Real-World Image Super-Resolution

1 code implementation ECCV 2020 Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, WangMeng Zuo, Liang Lin

Learning an SR model with conventional pixel-wise loss usually is easily dominated by flat regions and edges, and fails to infer realistic details of complex textures.

Image Super-Resolution

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

1 code implementation3 Aug 2020 Yuan Xie, Tianshui Chen, Tao Pu, Hefeng Wu, Liang Lin

However, most of these works focus on holistic feature adaptation, and they ignore local features that are more transferable across different datasets.

Facial Expression Recognition

Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning

1 code implementation3 Aug 2020 Tianshui Chen, Tao Pu, Hefeng Wu, Yuan Xie, Lingbo Liu, Liang Lin

Although each declares to achieve superior performance, fair comparisons are lacking due to the inconsistent choices of the source/target datasets and feature extractors.

Domain Adaptation Facial Expression Recognition +2

Fine-Grained Image Captioning with Global-Local Discriminative Objective

1 code implementation21 Jul 2020 Jie Wu, Tianshui Chen, Hefeng Wu, Zhi Yang, Guangchun Luo, Liang Lin

This is primarily due to (i) the conservative characteristic of traditional training objectives that drives the model to generate correct but hardly discriminative captions for similar images and (ii) the uneven word distribution of the ground-truth captions, which encourages generating highly frequent words/phrases while suppressing the less frequent but more concrete ones.

Image Captioning

EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning

1 code implementation ECCV 2020 Bailin Li, Bowen Wu, Jiang Su, Guangrun Wang, Liang Lin

Many algorithms try to predict model performance of the pruned sub-nets by introducing various evaluation methods.

Network Pruning

Bidirectional Graph Reasoning Network for Panoptic Segmentation

no code implementations CVPR 2020 Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.

Instance Segmentation Panoptic Segmentation

Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking

1 code implementation CVPR 2020 Hongjun Wang, Guangrun Wang, Ya Li, Dongyu Zhang, Liang Lin

To examine the robustness of ReID systems is rather important because the insecurity of ReID systems may cause severe losses, e. g., the criminals may use the adversarial perturbations to cheat the CCTV systems.

Adversarial Attack Person Re-Identification

Linguistically Driven Graph Capsule Network for Visual Question Reasoning

no code implementations23 Mar 2020 Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin

Inspired by the property of a capsule network that can carve a tree structure inside a regular convolutional neural network (CNN), we propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network", where the compositional process is guided by the linguistic parse tree.

Question Answering Visual Question Answering

Efficient Crowd Counting via Structured Knowledge Transfer

2 code implementations23 Mar 2020 Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang Lin

Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.

Crowd Counting Transfer Learning

Learning Reinforced Agents with Counterfactual Simulation for Medical Automatic Diagnosis

no code implementations14 Mar 2020 Junfan Lin, Ziliang Chen, Xiaodan Liang, Keze Wang, Liang Lin

To address this problem, this paper presents a propensity-based patient simulator (PBPS), which is capable of facilitating the training of MAD agents by generating informative counterfactual answers along with the disease diagnosis.

DDet: Dual-path Dynamic Enhancement Network for Real-World Image Super-Resolution

1 code implementation25 Feb 2020 Yukai Shi, Haoyu Zhong, Zhijing Yang, Xiaojun Yang, Liang Lin

Previous image SR methods fail to exhibit similar performance on Real-SR as the image data is not aligned inherently.

Image Super-Resolution

Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread

no code implementations22 Jan 2020 Haofeng Li, Guanbin Li, Binbin Yang, Guanqi Chen, Liang Lin, Yizhou Yu

The proposed algorithm for the first time achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.

Image Classification RGB Salient Object Detection +2

Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction

2 code implementations14 Jan 2020 Lingbo Liu, Jingwen Chen, Hefeng Wu, Jiajie Zhen, Guanbin Li, Liang Lin

To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs.

Representation Learning

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

no code implementations18 Dec 2019 Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin

In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations.

Semantic Segmentation Unsupervised Domain Adaptation

Blockwisely Supervised Neural Architecture Search with Knowledge Distillation

1 code implementation29 Nov 2019 Changlin Li, Jiefeng Peng, Liuchun Yuan, Guangrun Wang, Xiaodan Liang, Liang Lin, Xiaojun Chang

Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture.

Knowledge Distillation Neural Architecture Search

Knowledge Graph Transfer Network for Few-Shot Recognition

1 code implementation21 Nov 2019 Riquan Chen, Tianshui Chen, Xiaolu Hui, Hefeng Wu, Guanbin Li, Liang Lin

In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN).

Few-Shot Learning

Generalizing Energy-based Generative ConvNets from Particle Evolution Perspective

no code implementations31 Oct 2019 Yang Wu, Xu Cai, Pengxu Wei, Guanbin Li, Liang Lin

Compared with Generative Adversarial Networks (GAN), Energy-Based generative Models (EBMs) possess two appealing properties: i) they can be directly optimized without requiring an auxiliary network during the learning and synthesizing; ii) they can better approximate underlying distribution of the observed data by learning explicitly potential functions.

Layout-Graph Reasoning for Fashion Landmark Detection

no code implementations CVPR 2019 Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin

Each Layout-Graph Reasoning(LGR) layer aims to map feature representations into structural graph nodes via a Map-to-Node module, performs reasoning over structural graph nodes to achieve global layout coherency via a layout-graph reasoning module, and then maps graph nodes back to enhance feature representations via a Node-to-Map module.

Graph Clustering Hierarchical structure

Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning

no code implementations28 Sep 2019 Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin

Resembling the rapid learning capability of human, few-shot learning empowers vision systems to understand new concepts by training with few samples.

Few-Shot Learning Few-Shot Object Detection +1

Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network

no code implementations23 Sep 2019 Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Explanation and high-order reasoning capabilities are crucial for real-world visual question answering with diverse levels of inference complexity (e. g., what is the dog that is near the girl playing with?)

Question Answering Visual Question Answering

Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction

2 code implementations2 Sep 2019 Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, Liang Lin

Specifically, the first ConvLSTM unit takes normal traffic flow features as input and generates a hidden state at each time-step, which is further fed into the connected convolutional layer for spatial attention map inference.

Representation Learning Traffic Prediction

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

no code implementations ICCV 2019 Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang

To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.

Image Retrieval

Crowd Counting with Deep Structured Scale Integration Network

no code implementations ICCV 2019 Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, Liang Lin

Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people.

Crowd Counting Representation Learning

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

2 code implementations ICCV 2019 Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, Liang Lin

Recognizing multiple labels of images is a practical and challenging task, and significant progress has been made by searching semantic-aware regions and modeling label dependency.

Graph Representation Learning Multi-Label Classification

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

1 code implementation ICCV 2019 Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, Liang Lin

Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module.

 Ranked #1 on Video Salient Object Detection on VOS-T (using extra training data)

Salient Object Detection Unsupervised Video Object Segmentation +1

Learning Compact Target-Oriented Feature Representations for Visual Tracking

no code implementations5 Aug 2019 Chenglong Li, Yan Huang, Liang Wang, Jin Tang, Liang Lin

Many state-of-the-art trackers usually resort to the pretrained convolutional neural network (CNN) model for correlation filtering, in which deep features could usually be redundant, noisy and less discriminative for some certain instances, and the tracking performance might thus be affected.

Visual Tracking

Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching

1 code implementation8 Jul 2019 Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin

A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs).

Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks

1 code implementation CVPR 2019 Ziliang Chen, Jingyu Zhuang, Xiaodan Liang, Liang Lin

(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training.

Multi-target Domain Adaptation Transfer Learning +1

Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction

no code implementations15 May 2019 Lingbo Liu, Zhilin Qiu, Guanbin Li, Qing Wang, Wanli Ouyang, Liang Lin

Finally, a GCC module is applied to model the correlation between all regions by computing a global correlation feature as a weighted sum of all regional features, with the weights being calculated as the similarity between the corresponding region pairs.

Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning

no code implementations4 May 2019 Yukai Shi, Guanbin Li, Qingxing Cao, Keze Wang, Liang Lin

Face hallucination is a domain-specific super-resolution problem that aims to generate a high-resolution (HR) face image from a low-resolution~(LR) input.

Face Hallucination Super-Resolution

Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition

no code implementations22 Apr 2019 Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, Liang Lin

Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation.

Facial Action Unit Detection Representation Learning

Graphonomy: Universal Human Parsing via Graph Transfer Learning

1 code implementation CVPR 2019 Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin

By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity.

Human Parsing Transfer Learning

Adaptively Connected Neural Networks

1 code implementation CVPR 2019 Guangrun Wang, Keze Wang, Liang Lin

This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) {in} two aspects.

Document Classification Image Classification +1

Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

no code implementations CVPR 2019 Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Xiaogang Wang, Liang Lin

Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures.

3D Human Pose Estimation

Knowledge-Embedded Routing Network for Scene Graph Generation

3 code implementations CVPR 2019 Tianshui Chen, Weihao Yu, Riquan Chen, Liang Lin

More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions.

Graph Generation Scene Graph Generation

End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

no code implementations30 Jan 2019 Lin Xu, Qixian Zhou, Ke Gong, Xiaodan Liang, Jianheng Tang, Liang Lin

Besides the challenges for conversational dialogue systems (e. g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations.

Decision Making Dialogue Management +4

3D Human Pose Machines with Self-supervised Learning

2 code implementations arXiv.org 2019 Keze Wang, Liang Lin, Chenhan Jiang, Chen Qian, Pengxu Wei

Driven by recent computer vision and robotic applications, recovering 3D human poses has become increasingly important and attracted growing interests.

3D Human Pose Estimation Self-Supervised Learning

SNAS: Stochastic Neural Architecture Search

2 code implementations ICLR 2019 Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin

In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet.

Neural Architecture Search

Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning

no code implementations10 Dec 2018 Lingbo Liu, Guanbin Li, Yuan Xie, Yizhou Yu, Qing Wang, Liang Lin

In this paper, we propose a novel cascaded backbone-branches fully convolutional neural network~(BB-FCN) for rapidly and accurately localizing facial landmarks in unconstrained and cluttered settings.

Face Alignment Face Detection +2

FRAME Revisited: An Interpretation View Based on Particle Evolution

no code implementations4 Dec 2018 Xu Cai, Yang Wu, Guanbin Li, Ziliang Chen, Liang Lin

FRAME (Filters, Random fields, And Maximum Entropy) is an energy-based descriptive model that synthesizes visual realism by capturing mutual patterns from structural input signals.

Symbolic Graph Reasoning Meets Convolutions

1 code implementation NeurIPS 2018 Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric P. Xing

To cooperate with local convolutions, each SGR is constituted by three modules: a) a primal local-to-semantic voting module where the features of all symbolic nodes are generated by voting from local representations; b) a graph reasoning module propagates information over knowledge graph to achieve global semantic coherency; c) a dual semantic-to-local mapping module learns new associations of the evolved symbolic nodes with local representations, and accordingly enhances local features.

Image Classification Semantic Segmentation

Kalman Normalization: Normalizing Internal Representations Across Network Layers

no code implementations NeurIPS 2018 Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin

In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches.

Object Detection

Hybrid Knowledge Routed Modules for Large-scale Object Detection

1 code implementation NeurIPS 2018 Chenhan Jiang, Hang Xu, Xiangdan Liang, Liang Lin

The dominant object detection approaches treat the recognition of each region separately and overlook crucial semantic correlations between objects in one scene.

Object Detection

Cross-Modal Attentional Context Learning for RGB-D Object Detection

no code implementations30 Oct 2018 Guanbin Li, Yukang Gan, Hejun Wu, Nong Xiao, Liang Lin

In this paper, we address this problem by developing a Cross-Modal Attentional Context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data.

Autonomous Driving Object Detection

Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

no code implementations10 Oct 2018 Lili Huang, Jiefeng Peng, Ruimao Zhang, Guanbin Li, Liang Lin

Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision.

Representation Learning Semantic Segmentation

Interpretable Visual Question Answering by Reasoning on Dependency Trees

no code implementations6 Sep 2018 Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems.

Question Answering Visual Question Answering

Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks

1 code implementation3 Sep 2018 Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang, Chao Dong, Liang Lin

We consider the single image super-resolution problem in a more general case that the low-/high-resolution pairs and the down-sampling process are unavailable.

Image Super-Resolution Image-to-Image Translation

Generative Semantic Manipulation with Mask-Contrasting GAN

no code implementations ECCV 2018 Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing

Despite the promising results on paired/unpaired image-to-image translation achieved by Generative Adversarial Networks (GANs), prior works often only transfer the low-level information (e. g. color or texture changes), but fail to manipulate high-level semantic meanings (e. g., geometric structure or content) of different object regions.

Image-to-Image Translation

Attentive Crowd Flow Machines

no code implementations1 Sep 2018 Lingbo Liu, Ruimao Zhang, Jiefeng Peng, Guanbin Li, Bowen Du, Liang Lin

Traffic flow prediction is crucial for urban traffic management and public safety.

Monocular Depth Estimation with Affinity, Vertical Pooling, and Label Enhancement

no code implementations ECCV 2018 Yukang Gan, Xiangyu Xu, Wenxiu Sun, Liang Lin

While significant progress has been made in monocular depth estimation with Convolutional Neural Networks (CNNs) extracting absolute features, such as edges and textures, the depth constraint of neighboring pixels, namely relative features, has been mostly ignored by recent methods.

Monocular Depth Estimation Stereo Matching +1

Neural Task Planning with And-Or Graph Representations

no code implementations25 Aug 2018 Tianshui Chen, Riquan Chen, Lin Nie, Xiaonan Luo, Xiaobai Liu, Liang Lin

This paper focuses on semantic task planning, i. e., predicting a sequence of actions toward accomplishing a specific task under a certain scene, which is a new problem in computer vision research.

Common Sense Reasoning

Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

1 code implementation14 Aug 2018 Tianshui Chen, Wenxi Wu, Yuefang Gao, Le Dong, Xiaonan Luo, Liang Lin

In this work, we investigate simultaneously predicting categories of different levels in the hierarchy and integrating this structured correlation information into the deep neural network by developing a novel Hierarchical Semantic Embedding (HSE) framework.

Fine-Grained Image Classification Fine-Grained Image Recognition +1

Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

no code implementations4 Aug 2018 Guanbin Li, Xiang He, Wei zhang, Huiyou Chang, Le Dong, Liang Lin

Single image rain streaks removal has recently witnessed substantial progress due to the development of deep convolutional neural networks.

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

1 code implementation2 Aug 2018 Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin

Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e. g., head, leg, dress).

Human Parsing Semantic Segmentation +3

Instance-level Human Parsing via Part Grouping Network

1 code implementation ECCV 2018 Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin

Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass.

Edge Detection Human Parsing +2

Toward Characteristic-Preserving Image-based Virtual Try-On Network

3 code implementations ECCV 2018 Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang

Second, to alleviate boundary artifacts of warped clothes and make the results more realistic, we employ a Try-On Module that learns a composition mask to integrate the warped clothes and the rendered image to ensure smoothness.

Geometric Matching Virtual Try-on

SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

no code implementations16 Jul 2018 Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang

To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).

Video-Based Person Re-Identification

Crowd Counting using Deep Recurrent Spatial-Aware Network

no code implementations2 Jul 2018 Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin

Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera's perspective that causes huge appearance variations in people's scales and rotations.

Crowd Counting

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

1 code implementation2 Jul 2018 Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin

And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects.

Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

1 code implementation30 Jun 2018 Keze Wang, Liang Lin, Xiaopeng Yan, Ziliang Chen, Dongyu Zhang, Lei Zhang

The proposed process can be compatible with mini-batch based training (i. e., using a batch of unlabeled or partially labeled data as a one-time input) for object detection.

Active Learning Object Detection

Interpretable Video Captioning via Trajectory Structured Localization

no code implementations CVPR 2018 Xian Wu, Guanbin Li, Qingxing Cao, Qingge Ji, Liang Lin

Automatically describing open-domain videos with natural language are attracting increasing interest in the field of artificial intelligence.

Image Captioning Video Captioning +1

DRPose3D: Depth Ranking in 3D Human Pose Estimation

no code implementations23 May 2018 Min Wang, Xipeng Chen, Wentao Liu, Chen Qian, Liang Lin, Lizhuang Ma

In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation.

3D Human Pose Estimation 3D Pose Estimation

Multi-level Wavelet-CNN for Image Restoration

4 code implementations18 May 2018 Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, WangMeng Zuo

With the modified U-Net architecture, wavelet transform is introduced to reduce the size of feature maps in the contracting subnetwork.

Image Denoising Image Super-Resolution +1

Learning Warped Guidance for Blind Face Restoration

1 code implementation ECCV 2018 Xiaoming Li, Ming Liu, Yuting Ye, WangMeng Zuo, Liang Lin, Ruigang Yang

For better recovery of fine facial details, we modify the problem setting by taking both the degraded observation and a high-quality guided image of the same identity as input to our guided face restoration network (GFRNet).

Blind Face Restoration

Look into Person: Joint Body Parsing & Pose Estimation Network and A New Benchmark

3 code implementations5 Apr 2018 Xiaodan Liang, Ke Gong, Xiaohui Shen, Liang Lin

To further explore and take advantage of the semantic correlation of these two tasks, we propose a novel joint human parsing and pose estimation network to explore efficient context modeling, which can simultaneously predict parsing and pose with extremely high quality.

Human Parsing Pose Estimation +1

Visual Question Reasoning on General Dependency Tree

no code implementations CVPR 2018 Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin

This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence.

Question Answering Visual Question Answering

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

no code implementations CVPR 2018 Keze Wang, Xiaopeng Yan, Dongyu Zhang, Lei Zhang, Liang Lin

Though quite challenging, leveraging large-scale unlabeled or partially labeled images in a cost-effective way has increasingly attracted interests for its great importance to computer vision.

Active Learning Object Detection

Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

1 code implementation CVPR 2018 Jiahao Pang, Wenxiu Sun, Chengxi Yang, Jimmy Ren, Ruichao Xiao, Jin Zeng, Liang Lin

By feeding real stereo pairs of different domains to stereo models pre-trained with synthetic data, we see that: i) a pre-trained model does not generalize well to the new domain, producing artifacts at boundaries and ill-posed regions; however, ii) feeding an up-sampled stereo pair leads to a disparity map with extra details.

Stereo Matching Stereo Matching Hand

Weakly Supervised Salient Object Detection Using Image Labels

no code implementations17 Mar 2018 Guanbin Li, Yuan Xie, Liang Lin

Our algorithm is based on alternately exploiting a graphical model and training a fully convolutional network for model updating.

RGB Salient Object Detection Saliency Detection +1

Single View Stereo Matching

1 code implementation CVPR 2018 Yue Luo, Jimmy Ren, Mude Lin, Jiahao Pang, Wenxiu Sun, Hongsheng Li, Liang Lin

The resulting model outperforms all the previous monocular depth estimation methods as well as the stereo block matching method in the challenging KITTI dataset by only using a small number of real training data.

Ranked #10 on Monocular Depth Estimation on KITTI Eigen split (using extra training data)

Monocular Depth Estimation Stereo Matching +1

Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift

no code implementations CVPR 2018 Ruijia Xu, Ziliang Chen, WangMeng Zuo, Junjie Yan, Liang Lin

Motivated by the theoretical results in \cite{mansour2009domain}, the target distribution can be represented as the weighted combination of source distributions, and, the multi-source unsupervised domain adaptation via DCTN is then performed as two alternating steps: i) It deploys multi-way adversarial learning to minimize the discrepancy between the target and each of the multiple source domains, which also obtains the source-specific perplexity scores to denote the possibilities that a target sample belongs to different source domains.

Multi-Source Unsupervised Domain Adaptation Unsupervised Domain Adaptation

Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches

no code implementations9 Feb 2018 Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin

As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer.

Image Classification

Structured Inhomogeneous Density Map Learning for Crowd Counting

no code implementations20 Jan 2018 Hanhui Li, Xiangjian He, Hefeng Wu, Saeed Amirgholipour Kasmani, Ruomei Wang, Xiaonan Luo, Liang Lin

In this paper, we aim at tackling the problem of crowd counting in extremely high-density scenes, which contain hundreds, or even thousands of people.

Crowd Counting

Context-Aware Semantic Inpainting

no code implementations21 Dec 2017 Haofeng Li, Guanbin Li, Liang Lin, Yizhou Yu

Our proposed GAN-based framework consists of a fully convolutional design for the generator which helps to better preserve spatial structures and a joint loss function with a revised perceptual loss to capture high-level semantics in the context.

Image Inpainting

Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition

no code implementations20 Dec 2017 Tianshui Chen, Zhouxia Wang, Guanbin Li, Liang Lin

Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks.

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

2 code implementations20 Dec 2017 Tianshui Chen, Liang Lin, WangMeng Zuo, Xiaonan Luo, Lei Zhang

In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training.

Classification General Classification +1

LSTM Pose Machines

1 code implementation CVPR 2018 Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun, Jinshan Pan, Jianbo Liu, Jiahao Pang, Liang Lin

Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e. g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames.

Pose Estimation

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

no code implementations ICCV 2017 Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin

This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding.

General Classification Multi-Label Image Classification +1

Learning to Segment Human by Watching YouTube

no code implementations4 Oct 2017 Xiaodan Liang, Yunchao Wei, Liang Lin, Yunpeng Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan

An intuition on human segmentation is that when a human is moving in a video, the video-context (e. g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body.

Human Detection Semantic Segmentation +2

Visual Tracking via Dynamic Graph Learning

no code implementations4 Oct 2017 Chenglong Li, Liang Lin, WangMeng Zuo, Jin Tang, Ming-Hsuan Yang

First, the graph is initialized by assigning binary weights of some image patches to indicate the object and background patches according to the predicted bounding box.

Graph Learning Object Tracking +1

Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning

no code implementations4 Oct 2017 Dongyu Zhang, Liang Lin, Tianshui Chen, Xian Wu, Wenwei Tan, Ebroul Izquierdo

Sketch portrait generation benefits a wide range of applications such as digital entertainment and law enforcement.

Representation Learning

Deep Dual Learning for Semantic Image Segmentation

no code implementations ICCV 2017 Ping Luo, Guangrun Wang, Liang Lin, Xiaogang Wang

The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance.

Semantic Segmentation

Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions

no code implementations27 Sep 2017 Ruimao Zhang, Liang Lin, Guangrun Wang, Meng Wang, WangMeng Zuo

Rather than relying on elaborative annotations (e. g., manually labeled semantic maps and relations), we train our deep model in a weakly-supervised learning manner by leveraging the descriptive sentences of the training images.

Scene Labeling Scene Understanding

Attention-Aware Face Hallucination via Deep Reinforcement Learning

no code implementations CVPR 2017 Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li

Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images.

Face Hallucination Super-Resolution

Recurrent 3D Pose Sequence Machines

no code implementations CVPR 2017 Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, Hui Cheng

3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery.

3D Pose Estimation

Deep Co-Space: Sample Mining Across Feature Transformation for Semi-Supervised Learning

no code implementations28 Jul 2017 Ziliang Chen, Keze Wang, Xiao Wang, Pai Peng, Ebroul Izquierdo, Liang Lin

Aiming at improving performance of visual classification in a cost-effective manner, this paper proposes an incremental semi-supervised learning paradigm called Deep Co-Space (DCS).

Classification General Classification +1

Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning

no code implementations26 Jul 2017 Yukai Shi, Keze Wang, Chongyu Chen, Li Xu, Liang Lin

Single image super resolution (SR), which refers to reconstruct a higher-resolution (HR) image from the observed low-resolution (LR) image, has received substantial attention due to its tremendous application potentials.

Image Restoration Image Super-Resolution +1

Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

no code implementations15 Jul 2017 Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng

This paper aims at task-oriented action prediction, i. e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research.

Common Sense Reasoning

Learning Object Interactions and Descriptions for Semantic Image Segmentation

no code implementations CVPR 2017 Guangrun Wang, Ping Luo, Liang Lin, Xiaogang Wang

This work significantly increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset.

Image Captioning Semantic Segmentation

Interpretable Structure-Evolving LSTM

no code implementations CVPR 2017 Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing

Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization.

Small Data Image Classification

Progressively Diffused Networks for Semantic Image Segmentation

no code implementations20 Feb 2017 Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin

This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application.

Semantic Segmentation

Cost-Effective Active Learning for Deep Image Classification

1 code implementation13 Jan 2017 Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, Liang Lin

In this paper, we propose a novel active learning framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner.

Active Learning Classification +4

Active Self-Paced Learning for Cost-Effective and Progressive Face Identification

no code implementations13 Jan 2017 Liang Lin, Keze Wang, Deyu Meng, WangMeng Zuo, Lei Zhang

By naturally combining two recently rising techniques: active learning (AL) and self-paced learning (SPL), our framework is capable of automatically annotating new instances and incorporating them into training under weak expert re-certification.

Active Learning Face Identification

Learning to Segment Object Candidates via Recursive Neural Networks

no code implementations4 Dec 2016 Tianshui Chen, Liang Lin, Xian Wu, Nong Xiao, Xiaonan Luo

To avoid the exhaustive search over locations and scales, current state-of-the-art object detection systems usually involve a crucial component generating a batch of candidate object proposals from images.

Object Proposal Generation

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

no code implementations13 Aug 2016 Keze Wang, Shengfu Zhai, Hui Cheng, Xiaodan Liang, Liang Lin

In this paper, we propose a novel inference-embedded multi-task learning framework for predicting human pose from still depth images, which is implemented with a deep architecture of neural networks.

Multi-Task Learning Pose Estimation +1

Local- and Holistic- Structure Preserving Image Super Resolution via Deep Joint Component Learning

no code implementations25 Jul 2016 Yukai Shi, Keze Wang, Li Xu, Liang Lin

Recently, machine learning based single image super resolution (SR) approaches focus on jointly learning representations for high-resolution (HR) and low-resolution (LR) image patch pairs to improve the quality of the super-resolved images.

Image Super-Resolution Representation Learning

Joint Learning of Single-Image and Cross-Image Representations for Person Re-Identification

no code implementations CVPR 2016 Faqiang Wang, WangMeng Zuo, Liang Lin, David Zhang, Lei Zhang

Person re-identification has been usually solved as either the matching of single-image representation (SIR) or the classification of cross-image representation (CIR).

Person Re-Identification

Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning

no code implementations13 May 2016 Liang Lin, Guangrun Wang, WangMeng Zuo, Xiangchu Feng, Lei Zhang

Cross-domain visual data matching is one of the fundamental problems in many real-world vision tasks, e. g., matching persons across ID photos and surveillance videos.

Face Verification Person Re-Identification +1

LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

1 code implementation18 Apr 2016 Zhen Li, Yukang Gan, Xiaodan Liang, Yizhou Yu, Hui Cheng, Liang Lin

Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts.

Scene Labeling

DARI: Distance metric And Representation Integration for Person Verification

no code implementations15 Apr 2016 Guangrun Wang, Liang Lin, Shengyong Ding, Ya Li, Qing Wang

The past decade has witnessed the rapid development of feature representation learning and distance metric learning, whereas the two steps are often discussed separately.

Ranked #8 on Person Re-Identification on SYSU-30k (using extra training data)

Metric Learning Person Re-Identification +1

Deep Structured Scene Parsing by Learning with Image Descriptions

no code implementations CVPR 2016 Liang Lin, Guangrun Wang, Rui Zhang, Ruimao Zhang, Xiaodan Liang, WangMeng Zuo

This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i. e., a semantic object hierarchy with object interaction relations) that finely accords with human perception.

Scene Labeling Scene Understanding

Geometric Scene Parsing with Hierarchical LSTM

no code implementations7 Apr 2016 Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin

This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions.

3D Reconstruction Scene Labeling

Semantic Object Parsing with Graph LSTM

no code implementations23 Mar 2016 Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan

By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.

Character Proposal Network for Robust Text Extraction

no code implementations13 Feb 2016 Shuye Zhang, Mude Lin, Tianshui Chen, Lianwen Jin, Liang Lin

Maximally stable extremal regions (MSER), which is a popular method to generate character proposals/candidates, has shown superior performance in scene text detection.

Scene Text Scene Text Detection

Learning Support Correlation Filters for Visual Tracking

no code implementations22 Jan 2016 Wangmeng Zuo, Xiaohe Wu, Liang Lin, Lei Zhang, Ming-Hsuan Yang

Sampling and budgeting training examples are two essential factors in tracking algorithms based on support vector machines (SVMs) as a trade-off between accuracy and efficiency.

Visual Tracking

Deep Feature Learning with Relative Distance Comparison for Person Re-identification

no code implementations11 Dec 2015 Shengyong Ding, Liang Lin, Guangrun Wang, Hongyang Chao

Identifying the same individual across different scenes is an important yet difficult task in intelligent video surveillance.

Ranked #10 on Person Re-Identification on SYSU-30k (using extra training data)

Person Re-Identification

Human Parsing With Contextualized Convolutional Neural Network

no code implementations ICCV 2015 Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

Human Parsing

Semantic Object Parsing with Local-Global Long Short-Term Memory

no code implementations CVPR 2016 Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, Shuicheng Yan

The long chains of sequential computation by stacked LG-LSTM layers also enable each pixel to sense a much larger region for inference benefiting from the memorization of previous dependencies in all positions along all dimensions.

Reversible Recursive Instance-level Object Segmentation

no code implementations CVPR 2016 Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Zequn Jie, Jiashi Feng, Liang Lin, Shuicheng Yan

By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing.

Denoising Semantic Segmentation