Search Results for author: Xilin Chen

Found 146 papers, 64 papers with code

Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection

no code implementations18 Nov 2024 Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen

The primary goal of out-of-distribution (OOD) detection tasks is to identify inputs with semantic shifts, i. e., if samples from novel classes are absent in the in-distribution (ID) dataset used for training, we should reject these OOD samples rather than misclassifying them into existing ID classes.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models

no code implementations11 Nov 2024 Jiachen Liang, Ruibing Hou, Minyang Hu, Hong Chang, Shiguang Shan, Xilin Chen

Under this unsupervised multi-domain setting, we have identified inherent model bias within CLIP, notably in its visual and text encoders.

Test-time Adaptation Transductive Learning

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation

1 code implementation12 Oct 2024 Yifeng Xu, Zhenliang He, Shiguang Shan, Xilin Chen

However, for every single condition type, ControlNet requires independent training on millions of data pairs with hundreds of GPU hours, which is quite expensive and makes it challenging for ordinary users to explore and develop new types of conditions.

Conditional Image Generation

HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding

no code implementations9 Oct 2024 Keliang Li, Zaifei Yang, Jiahe Zhao, Hongze Shen, Ruibing Hou, Hong Chang, Shiguang Shan, Xilin Chen

The significant advancements in visual understanding and instruction following from Multimodal Large Language Models (MLLMs) have opened up more possibilities for broader applications in diverse and universal human-centric scenarios.

Benchmarking Instruction Following

Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models

no code implementations3 Sep 2024 Bin Fu, Qiyang Wan, Jialin Li, Ruiping Wang, Xilin Chen

Categorization, a core cognitive ability in humans that organizes objects based on common features, is essential to cognitive science as well as computer vision.

Question Answering Visual Question Answering

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

1 code implementation5 Jul 2024 Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen

In this paper, for the first time, we propose a comprehensive defense method named T2IShield to detect, localize, and mitigate such attacks.

Backdoor Attack

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

1 code implementation27 Jun 2024 Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang Shan, Xilin Chen

Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and noisy scenarios unexplored.

Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

1 code implementation24 Jun 2024 Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen

Furthermore, based on the results of our quality measurement, we construct a High-Quality Hallucination Benchmark (HQH) for LVLMs, which demonstrates superior reliability and validity under our HQM framework.

Hallucination

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

1 code implementation20 Jun 2024 Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence.

Language Modelling

Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox

1 code implementation14 Jun 2024 Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen

In this paper, we construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue, in which we divide the test samples into subsets with different semantic and covariate shift degrees relative to the ID dataset.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation

no code implementations NeurIPS 2023 Jiachen Liang, Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

In this paper, we propose a novel SSL setting, where unlabeled samples are drawn from a mixed distribution that deviates from the feature distribution of labeled samples.

Pseudo Label

M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models

1 code implementation24 May 2024 Hongyu Wang, Jiayu Xu, Senwei Xie, Ruiping Wang, Jialin Li, Zhaojie Xie, Bin Zhang, Chuyan Xiong, Xilin Chen

In this work, we introduce M4U, a novel and challenging benchmark for assessing the capability of multi-discipline multilingual multimodal understanding and reasoning.

Multimodal Reasoning

Task-adaptive Q-Face

no code implementations15 May 2024 Haomiao Sun, Mingjie He, Shiguang Shan, Hu Han, Xilin Chen

Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging.

Action Unit Detection Age Estimation +2

Quantitative Investment Diversification Strategies via Various Risk Models

no code implementations27 Apr 2024 Maysam Khodayari Gharanchaei, Prabhu Prasad Panda, Xilin Chen

This paper focuses on the developing of high-dimensional risk models to construct portfolios of securities in the US stock exchange.

Application of Deep Learning for Factor Timing in Asset Management

no code implementations27 Apr 2024 Prabhu Prasad Panda, Maysam Khodayari Gharanchaei, Xilin Chen, Haoshu Lyu

The paper examines the performance of regression models (OLS linear regression, Ridge regression, Random Forest, and Fully-connected Neural Network) on the prediction of CMA (Conservative Minus Aggressive) factor premium and the performance of factor timing investment with them.

Asset Management Deep Learning +1

Clothes-Changing Person Re-Identification with Feasibility-Aware Intermediary Matching

no code implementations15 Apr 2024 Jiahe Zhao, Ruibing Hou, Hong Chang, Xinqian Gu, Bingpeng Ma, Shiguang Shan, Xilin Chen

Current clothes-changing person re-identification (re-id) approaches usually perform retrieval based on clothes-irrelevant features, while neglecting the potential of clothes-relevant features.

Clothes Changing Person Re-Identification Retrieval

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

1 code implementation CVPR 2024 Xiaolong Tang, Meina Kan, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen

The proposed Historical Prediction Attention together with the Agent Attention and Mode Attention is further formulated as the Triple Factorized Attention module, serving as the core design of HPNet. Experiments on the Argoverse and INTERACTION datasets show that HPNet achieves state-of-the-art performance, and generates accurate and stable future trajectories.

Autonomous Driving Trajectory Forecasting

Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications

1 code implementation6 Mar 2024 Minyang Hu, Hong Chang, Zong Guo, Bingpeng Ma, Shiguan Shan, Xilin Chen

In this paper, we try to understand FSL by delving into two key questions: (1) How to quantify the relationship between \emph{training} and \emph{novel} tasks?

Attribute Data Augmentation +1

Progressive Conservative Adaptation for Evolving Target Domains

no code implementations7 Feb 2024 Gangming Zhao, Chaoqi Chen, Wenhao He, Chengwei Pan, Chaowei Fang, Jinpeng Li, Xilin Chen, Yizhou Yu

Moreover, as adjusting to the most recent target domain can interfere with the features learned from previous target domains, we develop a conservative sparse attention mechanism.

Domain Adaptation Meta-Learning +1

ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations

no code implementations CVPR 2024 Yuanhang Zhang, Shuang Yang, Shiguang Shan, Xilin Chen

While many recent approaches for this task primarily rely on guiding the learning process using the audio modality alone to capture information shared between audio and video we reframe the problem as the acquisition of shared unique (modality-specific) and synergistic speech information to address the inherent asymmetry between the modalities.

Audio-Visual Speech Recognition Lipreading +2

Cooperative Dual Attention for Audio-Visual Speech Enhancement with Facial Cues

no code implementations24 Nov 2023 Feixiang Wang, Shuang Yang, Shiguang Shan, Xilin Chen

By integrating cooperative dual attention in the visual encoder and audio-visual fusion strategy, our model effectively extracts beneficial speech information from both audio and visual cues for AVSE.

Speech Enhancement

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

1 code implementation8 Oct 2023 Songtao Luo, Shuang Yang, Shiguang Shan, Xilin Chen

For deep layers where both the speaker's features and the speech content features are all expressed well, we introduce the speaker-adaptive features to learn for suppressing the speech content irrelevant noise for robust lip reading.

Lip Reading

Dual Compensation Residual Networks for Class Imbalanced Learning

no code implementations25 Aug 2023 Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

Learning generalizable representation and classifier for class-imbalanced data is challenging for data-driven deep models.

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

1 code implementation19 Jun 2023 Shaolei Zhang, Qingkai Fang, Zhuocheng Zhang, Zhengrui Ma, Yan Zhou, Langlin Huang, Mengyu Bu, Shangtong Gui, Yunji Chen, Xilin Chen, Yang Feng

To minimize human workload, we propose to transfer the capabilities of language generation and instruction following from English to other languages through an interactive translation task.

Instruction Following Text Generation +1

Function-Consistent Feature Distillation

1 code implementation24 Apr 2023 Dongyang Liu, Meina Kan, Shiguang Shan, Xilin Chen

The core idea of FCFD is to make teacher and student features not only numerically similar, but more importantly produce similar outputs when fed to the later part of the same network.

Image Classification object-detection +1

Diversity-Measurable Anomaly Detection

1 code implementation CVPR 2023 Wenrui Liu, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

In this paper, to better handle the tradeoff problem, we propose Diversity-Measurable Anomaly Detection (DMAD) framework to enhance reconstruction diversity while avoid the undesired generalization on anomalies.

Anomaly Detection In Surveillance Videos Defect Detection +2

Source-Free Adaptive Gaze Estimation by Uncertainty Reduction

1 code implementation CVPR 2023 Xin Cai, Jiabei Zeng, Shiguang Shan, Xilin Chen

In light of this, we present an unsupervised source-free domain adaptation approach for gaze estimation, which adapts a source-trained gaze estimator to unlabeled target domains without source data.

Gaze Estimation Source-Free Domain Adaptation

DandelionNet: Domain Composition with Instance Adaptive Classification for Domain Generalization

no code implementations ICCV 2023 Lanqing Hu, Meina Kan, Shiguang Shan, Xilin Chen

Domain generalization (DG) attempts to learn a model on source domains that can well generalize to unseen but different domains.

Domain Generalization

CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition

no code implementations ICCV 2023 Peiqi Jiao, Yuecong Min, Yanan Li, Xiaotao Wang, Lei Lei, Xilin Chen

The co-occurrence signals (e. g., hand shape, facial expression, and lip pattern) play a critical role in Continuous Sign Language Recognition (CSLR).

Sign Language Recognition Visual Grounding

DISC: Learning From Noisy Labels via Dynamic Instance-Specific Selection and Correction

1 code implementation CVPR 2023 YiFan Li, Hu Han, Shiguang Shan, Xilin Chen

Then we propose a dynamic threshold strategy for each instance, based on the momentum of each instance's memorization strength in previous epochs to select and correct noisy labeled data.

Learning with noisy labels Memorization

Clothes-Changing Person Re-identification with RGB Modality Only

1 code implementation CVPR 2022 Xinqian Gu, Hong Chang, Bingpeng Ma, Shutao Bai, Shiguang Shan, Xilin Chen

In this paper, we propose a Clothes-based Adversarial Loss (CAL) to mine clothes-irrelevant features from the original RGB images by penalizing the predictive power of re-id model w. r. t.

Clothes Changing Person Re-Identification Multiview Gait Recognition

Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework

1 code implementation22 Mar 2022 Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability.

Relation Visual Object Tracking +1

Salient-to-Broad Transition for Video Person Re-Identification

1 code implementation CVPR 2022 Shutao Bai, Bingpeng Ma, Hong Chang, Rui Huang, Xilin Chen

To further improve SBM, an Integration-and-Distribution Module (IDM) is introduced to enhance frame-level representations.

Video-Based Person Re-Identification

Enhancing Face Recognition With Self-Supervised 3D Reconstruction

no code implementations CVPR 2022 Mingjie He, Jie Zhang, Shiguang Shan, Xilin Chen

In this paper, we propose to enhance face recognition with a bypass of self-supervised 3D reconstruction, which enforces the neural backbone to focus on the identity-related depth and albedo information while neglects the identity-irrelevant pose and illumination information.

3D Face Reconstruction 3D Reconstruction +3

HRFormer: High-Resolution Vision Transformer for Dense Predict

2 code implementations NeurIPS 2021 Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.

Pose Estimation Semantic Segmentation +1

SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot Learning

1 code implementation8 Nov 2021 Fengyuan Yang, Ruiping Wang, Xilin Chen

However, human can learn new classes quickly even given few samples since human can tell what discriminative features should be focused on about each category based on both the visual and semantic prior knowledge.

feature selection Few-Shot Learning

HRFormer: High-Resolution Transformer for Dense Prediction

1 code implementation18 Oct 2021 Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.

Image Classification Multi-Person Pose Estimation +2

UniCon: Unified Context Network for Robust Active Speaker Detection

no code implementations5 Aug 2021 Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen

Our solution is a novel, unified framework that focuses on jointly modeling multiple types of contextual information: spatial context to indicate the position and scale of each candidate's face, relational context to capture the visual relationships among the candidates and contrast audio-visual affinities with each other, and temporal context to aggregate long-term information and smooth out local uncertainties.

Active Speaker Detection Audio-Visual Active Speaker Detection

Hierarchical Context-aware Network for Dense Video Event Captioning

1 code implementation ACL 2021 Lei Ji, Xianglin Guo, Haoyang Huang, Xilin Chen

Dense video event captioning aims to generate a sequence of descriptive captions for each event in a long untrimmed video.

Descriptive

Locality-aware Channel-wise Dropout for Occluded Face Recognition

no code implementations20 Jul 2021 Mingjie He, Jie Zhang, Shiguang Shan, Xiao Liu, Zhongqin Wu, Xilin Chen

Furthermore, by randomly dropping out several feature channels, our method can well simulate the occlusion of larger area.

Face Recognition

Feature Completion for Occluded Person Re-Identification

1 code implementation24 Jun 2021 Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen

Our method significantly outperforms existing methods on the occlusion datasets, while remains top even superior performance on holistic datasets.

Decoder Occluded Person Re-Identification

FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation

no code implementations CVPR 2021 Sijin Wang, Ziwei Yao, Ruiping Wang, Zhongqin Wu, Xilin Chen

Then for evaluating the adequacy of the candidate caption, it highlights the image gist on the visual scene graph under the guidance of the reference captions.

Image Captioning

Continuity-Discrimination Convolutional Neural Network for Visual Object Tracking

no code implementations18 Apr 2021 Shen Li, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen

This paper proposes a novel model, named Continuity-Discrimination Convolutional Neural Network (CD-CNN), for visual object tracking.

Object Visual Object Tracking

Visual Alignment Constraint for Continuous Sign Language Recognition

2 code implementations ICCV 2021 Yuecong Min, Aiming Hao, Xiujuan Chai, Xilin Chen

Specifically, the proposed VAC comprises two auxiliary losses: one focuses on visual features only, and the other enforces prediction alignment between the feature extractor and the alignment module.

Sign Language Recognition

Topic Scene Graph Generation by Attention Distillation From Caption

no code implementations ICCV 2021 Wenbin Wang, Ruiping Wang, Xilin Chen

To this end, we let the scene graph borrow the ability from the image caption so that it can be a specialist on the basis of remaining all-around, resulting in the so-called Topic Scene Graph.

Caption Generation Graph Generation +1

Self-Mutual Distillation Learning for Continuous Sign Language Recognition

1 code implementation ICCV 2021 Aiming Hao, Yuecong Min, Xilin Chen

Currently, a typical network combination for CSLR includes a visual module, which focuses on spatial and short-temporal information, followed by a contextual module, which focuses on long-temporal information, and the Connectionist Temporal Classification (CTC) loss is adopted to train the network.

Knowledge Distillation Sign Language Recognition

Holistic Pose Graph: Modeling Geometric Structure Among Objects in a Scene Using Graph Inference for 3D Object Prediction

no code implementations ICCV 2021 Jiwei Xiao, Ruiping Wang, Xilin Chen

The inference of the HPG uses GRU to encode the pose features from their corresponding regions in a single RGB image, and passes messages along the graph structure iteratively to improve the predicted poses.

Object Pose Estimation

Cross-Encoder for Unsupervised Gaze Representation Learning

1 code implementation ICCV 2021 Yunjia Sun, Jiabei Zeng, Shiguang Shan, Xilin Chen

To address the issue that the feature of gaze is always intertwined with the appearance of the eye, Cross-Encoder disentangles the features using a latent-code-swapping mechanism on eye-consistent image pairs and gaze-similar ones.

Gaze Estimation Representation Learning

Learn an Effective Lip Reading Model without Pains

1 code implementation15 Nov 2020 Dalu Feng, Shuang Yang, Shiguang Shan, Xilin Chen

Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.

Lipreading Lip Reading +2

IAUnet: Global Context-Aware Feature Learning for Person Re-Identification

1 code implementation2 Sep 2020 Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen

Furthermore, a Channel IAU (CIAU) module is designed to model the semantic contextual interactions between channel features to enhance the feature representation, especially for small-scale visual cues and body parts.

Object Categorization Person Re-Identification

Temporal Complementary Learning for Video Person Re-Identification

2 code implementations ECCV 2020 Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

This paper proposes a Temporal Complementary Learning Network that extracts complementary features of consecutive video frames for video person re-identification.

Video-Based Person Re-Identification

Appearance-Preserving 3D Convolution for Video-based Person Re-identification

1 code implementation ECCV 2020 Xinqian Gu, Hong Chang, Bingpeng Ma, Hongkai Zhang, Xilin Chen

Due to the imperfect person detection results and posture changes, temporal appearance misalignment is unavoidable in video-based person re-identification (ReID).

Human Detection Video-Based Person Re-Identification

SegFix: Model-Agnostic Boundary Refinement for Segmentation

4 code implementations ECCV 2020 Yuhui Yuan, Jingyi Xie, Xilin Chen, Jingdong Wang

We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model.

Segmentation

Synchronous Bidirectional Learning for Multilingual Lip Reading

1 code implementation8 May 2020 Mingshuang Luo, Shuang Yang, Xilin Chen, Zitao Liu, Shiguang Shan

Based on this idea, we try to explore the synergized learning of multilingual lip reading in this paper, and further propose a synchronous bidirectional learning (SBL) framework for effective synergy of multilingual lip reading.

Lip Reading

Single-Side Domain Generalization for Face Anti-Spoofing

1 code implementation CVPR 2020 Yunpei Jia, Jie Zhang, Shiguang Shan, Xilin Chen

In this work, we propose an end-to-end single-side domain generalization framework (SSDG) to improve the generalization ability of face anti-spoofing.

Domain Generalization Face Anti-Spoofing +1

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

3 code implementations ECCV 2020 Hongkai Zhang, Hong Chang, Bingpeng Ma, Naiyan Wang, Xilin Chen

For example, the fixed label assignment strategy and regression loss function cannot fit the distribution change of proposals and thus are harmful to training high quality detectors.

Ranked #88 on Object Detection on COCO test-dev (using extra training data)

object-detection Object Detection +2

Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

2 code implementations CVPR 2020 Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen

Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation, whose pixel-level labels take the same spatial transformation as the input images during data augmentation.

Data Augmentation Weakly supervised Semantic Segmentation +1

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

1 code implementation CVPR 2020 Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen

Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes.

Graph Neural Network Question Answering +1

Mutual Information Maximization for Effective Lip Reading

1 code implementation13 Mar 2020 Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen

By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading.

Lipreading Lip Reading

Deformation Flow Based Two-Stream Network for Lip Reading

1 code implementation12 Mar 2020 Jing-Yun Xiao, Shuang Yang, Yuan-Hang Zhang, Shiguang Shan, Xilin Chen

Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region.

Knowledge Distillation Lipreading +2

Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading

no code implementations9 Mar 2020 Mingshuang Luo, Shuang Yang, Shiguang Shan, Xilin Chen

On the one hand, we introduce the evaluation metric (refers to the character error rate in this paper) as a form of reward to optimize the model together with the original discriminative target.

Lipreading Lip Reading +1

Emotion Recognition for In-the-wild Videos

no code implementations13 Feb 2020 Hanyu Liu, Jiabei Zeng, Shiguang Shan, Xilin Chen

This paper is a brief introduction to our submission to the seven basic expression classification track of Affective Behavior Analysis in-the-wild Competition held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020.

Emotion Recognition General Classification +1

$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild

1 code implementation7 Feb 2020 Yuan-Hang Zhang, Rulin Huang, Jiabei Zeng, Shiguang Shan, Xilin Chen

This report describes a multi-modal multi-task ($M^3$T) approach underlying our submission to the valence-arousal estimation track of the Affective Behavior Analysis in-the-wild (ABAW) Challenge, held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020.

Arousal Estimation Gesture Recognition

Deep Heterogeneous Hashing for Face Video Retrieval

no code implementations4 Nov 2019 Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen

To tackle the key challenge of hashing on the manifold, a well-studied Riemannian kernel mapping is employed to project data (i. e. covariance matrices) into Euclidean space and thus enables to embed the two heterogeneous representations into a common Hamming space, where both intra-space discriminability and inter-space compatibility are considered.

Retrieval Video Retrieval

FCSR-GAN: Joint Face Completion and Super-resolution via Multi-task Learning

1 code implementation4 Nov 2019 Jiancheng Cai, Hu Han, Shiguang Shan, Xilin Chen

Combined variations containing low-resolution and occlusion often present in face images in the wild, e. g., under the scenario of video surveillance.

Face Identification Facial Inpainting +3

Learning-based Real-time Detection of Intrinsic Reflectional Symmetry

no code implementations1 Nov 2019 Yi-Ling Qiao, Lin Gao, Shu-Zhi Liu, Ligang Liu, Yu-Kun Lai, Xilin Chen

In this paper, we propose \YL{a} learning-based approach to intrinsic reflectional symmetry detection.

Symmetry Detection

RhythmNet: End-to-end Heart Rate Estimation from Face via Spatial-temporal Representation

no code implementations25 Oct 2019 Xuesong Niu, Shiguang Shan, Hu Han, Xilin Chen

Recently, some methods have been proposed for remote HR estimation from face videos; however, most of them focus on well-controlled scenarios, their generalization ability into less-constrained scenarios (e. g., with head movement, and bad illumination) are not known.

Heart rate estimation

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition

1 code implementation NeurIPS 2019 Xuesong Niu, Hu Han, Shiguang Shan, Xilin Chen

In this work, we propose a semi-supervised approach for AU recognition utilizing a large number of web face images without AU labels and a relatively small face dataset with AU annotations inspired by the co-training methods.

Emotion Recognition Facial Action Unit Detection

Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval

1 code implementation11 Oct 2019 Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, Xilin Chen

In the light of recent success of scene graph in many CV and NLP tasks for describing complex natural scenes, we propose to represent image and text with two kinds of scene graphs: visual scene graph (VSG) and textual scene graph (TSG), each of which is exploited to jointly characterize objects and relationships in the corresponding modality.

Graph Matching Image-text Retrieval +1

Hierarchical Disentangle Network for Object Representation Learning

no code implementations25 Sep 2019 Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen

In this paper, we propose the hierarchical disentangle network (HDN) to exploit the rich hierarchical characteristics among categories to divide the disentangling process in a coarse-to-fine manner, such that each level only focuses on learning the specific representations in its granularity and finally the common and unique representations in all granularities jointly constitute the raw object.

Decoder Disentanglement +2

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

11 code implementations ECCV 2020 Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang

We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.

Decoder Object +2

Transferable Contrastive Network for Generalized Zero-Shot Learning

no code implementations ICCV 2019 Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen

Zero-shot learning (ZSL) is a challenging problem that aims to recognize the target categories without seen data, where semantic information is leveraged to transfer knowledge from some source classes.

Generalized Zero-Shot Learning Transfer Learning

CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense

no code implementations8 Aug 2019 Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen

To comprehensively evaluate such abilities, we propose a VQA benchmark, CRIC, which introduces new types of questions about Compositional Reasoning on vIsion and Commonsense, and an evaluation metric integrating the correctness of answering and commonsense grounding.

Question Answering Visual Question Answering (VQA)

Interaction-and-Aggregation Network for Person Re-identification

1 code implementation CVPR 2019 Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen

Person re-identification (reID) benefits greatly from deep convolutional neural networks (CNNs) which learn robust feature embeddings.

Person Re-Identification

VRSTC: Occlusion-Free Video Person Re-Identification

no code implementations CVPR 2019 Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen

For one thing, the spatial structure of a pedestrian frame can be used to predict the occluded body parts from the unoccluded body parts of this frame.

Video-Based Person Re-Identification

Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection

no code implementations16 Jul 2019 Hongkai Zhang, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

Recent researches attempt to improve the detection performance by adopting the idea of cascade for single-stage detectors.

General Classification Object +2

Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation

2 code implementations ACL 2019 Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Xilin Chen, Jie zhou

Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer model through discarding the autoregressive mechanism and generating target words independently, which fails to exploit the target sequential information.

Decoder Machine Translation +2

Fully Learnable Group Convolution for Acceleration of Deep Neural Networks

no code implementations CVPR 2019 Xijun Wang, Meina Kan, Shiguang Shan, Xilin Chen

Benefitted from its great success on many tasks, deep learning is increasingly used on low-computational-cost devices, e. g. smartphone, embedded devices, etc.

Tattoo Image Search at Scale: Joint Detection and Compact Representation Learning

no code implementations1 Nov 2018 Hu Han, Jie Li, Anil K. Jain, Shiguang Shan, Xilin Chen

To close the gap, we propose an efficient tattoo search approach that is able to learn tattoo detection and compact representation jointly in a single convolutional neural network (CNN) via multi-task learning.

Image Retrieval Multi-Task Learning +3

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

2 code implementations16 Oct 2018 Shuang Yang, Yuan-Hang Zhang, Dalu Feng, Mingmin Yang, Chenhao Wang, Jing-Yun Xiao, Keyu Long, Shiguang Shan, Xilin Chen

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Lipreading Lip Reading +2

VIPL-HR: A Multi-modal Database for Pulse Estimation from Less-constrained Face Video

1 code implementation11 Oct 2018 Xuesong Niu, Hu Han, Shiguang Shan, Xilin Chen

We also learn a deep HR estimator (named as RhythmNet) with the proposed spatial-temporal representation, which achieves promising results on both the public-domain and our VIPL-HR HR estimation databases.

Representation Learning

Meta-Learning with Individualized Feature Space for Few-Shot Classification

no code implementations27 Sep 2018 Chunrui Han, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen

Specifically, we introduce a kernel generator as meta-learner to learn to construct feature embedding for query images.

Classification Meta-Learning +1

Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation

1 code implementation EMNLP 2018 Chenze Shao, Yang Feng, Xilin Chen

Neural machine translation (NMT) models are usually trained with the word-level loss using the teacher forcing algorithm, which not only evaluates the translation improperly but also suffers from exposure bias.

Machine Translation NMT +1

OCNet: Object Context Network for Scene Parsing

8 code implementations4 Sep 2018 Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang

To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}.

Object Relation +2

Face Recognition with Contrastive Convolution

no code implementations ECCV 2018 Chunrui Han, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen

In current face recognition approaches with convolutional neural network (CNN), a pair of faces to compare are independently fed into the CNN for feature extraction.

Face Recognition Face Verification

Facial Expression Recognition with Inconsistently Annotated Datasets

no code implementations ECCV 2018 Jiabei Zeng, Shiguang Shan, Xilin Chen

To address the inconsistency, we propose an Inconsistent Pseudo Annotations to Latent Truth(IPA2LT) framework to train a FER model from multiple inconsistently labeled datasets and large scale unlabeled data.

Facial Expression Recognition Facial Expression Recognition (FER)

Generative Adversarial Network with Spatial Attention for Face Attribute Editing

1 code implementation ECCV 2018 Gang Zhang, Meina Kan, Shiguang Shan, Xilin Chen

The generator contains an attribute manipulation network (AMN) to edit the face image, and a spatial attention network (SAN) to localize the attribute-specific region which restricts the alternation of AMN within this region.

Attribute Data Augmentation +2

Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition

no code implementations ECCV 2018 Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen

Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets.

Dictionary Learning Zero-Shot Learning

Duplex Generative Adversarial Network for Unsupervised Domain Adaptation

no code implementations CVPR 2018 Lanqing Hu, Meina Kan, Shiguang Shan, Xilin Chen

Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation.

Generative Adversarial Network Object Recognition +1

Mean-Variance Loss for Deep Age Estimation From a Face

no code implementations CVPR 2018 Hongyu Pan, Hu Han, Shiguang Shan, Xilin Chen

Age estimation has broad application prospects of many fields, such as video surveillance, social networking, and human-computer interaction.

Age Estimation MORPH

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

1 code implementation CVPR 2018 Xuepeng Shi, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen

Rotation-invariant face detection, i. e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applications but still remains as a challenging task, due to the large variations of face appearances.

Binary Classification Face Detection

Texture Classification in Extreme Scale Variations using GANet

no code implementations13 Feb 2018 Li Liu, Jie Chen, Guoying Zhao, Paul Fieguth, Xilin Chen, Matti Pietikäinen

Because extreme scale variations are not necessarily present in most standard texture databases, to support the proposed extreme-scale aspects of texture understanding we are developing a new dataset, the Extreme Scale Variation Textures (ESVaT), to test the performance of our framework.

Classification General Classification +1

AttGAN: Facial Attribute Editing by Only Changing What You Want

10 code implementations29 Nov 2017 Zhenliang He, WangMeng Zuo, Meina Kan, Shiguang Shan, Xilin Chen

Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of the given face conditioned on the desired attributes.

Attribute Decoder

Catadioptric HyperSpectral Light Field Imaging

no code implementations ICCV 2017 Yujia Xue, Kang Zhu, Qiang Fu, Xilin Chen, Jingyi Yu

In this paper, we present a single camera hyperspectral light field imaging solution that we call Snapshot Plenoptic Imager (SPI).

Learning Discriminative Latent Attributes for Zero-Shot Classification

no code implementations ICCV 2017 Huajie Jiang, Ruiping Wang, Shiguang Shan, Yi Yang, Xilin Chen

Zero-shot learning (ZSL) aims to transfer knowledge from observed classes to the unseen classes, based on the assumption that both the seen and unseen classes share a common semantic space, among which attributes enjoy a great popularity.

Attribute Classification +3

Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition

no code implementations ICCV 2017 Wanglong Wu, Meina Kan, Xin Liu, Yi Yang, Shiguang Shan, Xilin Chen

The designed ReST has an intrinsic recursive structure and is capable of progressively aligning faces to a canonical one, even those with large variations.

Face Alignment Face Recognition

Discriminative Covariance Oriented Representation Learning for Face Recognition With Image Sets

no code implementations CVPR 2017 Wen Wang, Ruiping Wang, Shiguang Shan, Xilin Chen

For face recognition with image sets, while most existing works mainly focus on building robust set models with hand-crafted feature, it remains a research gap to learn better image representations which can closely match the subsequent image set modeling and classification.

Face Recognition General Classification +2

Learning Multifunctional Binary Codes for Both Category and Attribute Oriented Retrieval Tasks

no code implementations CVPR 2017 Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

In this paper we propose a unified framework to address multiple realistic image retrieval tasks concerning both category and attributes.

Attribute Image Retrieval +1

Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach

no code implementations3 Jun 2017 Hu Han, Anil K. Jain, Fang Wang, Shiguang Shan, Xilin Chen

In DMTL, we tackle attribute correlation and heterogeneity with convolutional neural networks (CNNs) consisting of shared feature learning for all the attributes, and category-specific feature learning for heterogeneous attributes.

Attribute Facial Attribute Classification +4

Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness

no code implementations23 Sep 2016 Shuzhe Wu, Meina Kan, Zhenliang He, Shiguang Shan, Xilin Chen

On the other hand, by using a unified MLP cascade to examine proposals of all views in a centralized style, it provides a favorable solution for multi-view face detection with high accuracy and low time-cost.

Face Alignment Face Detection

VIPLFaceNet: An Open Source Deep Face Recognition SDK

no code implementations13 Sep 2016 Xin Liu, Meina Kan, Wanglong Wu, Shiguang Shan, Xilin Chen

Robust face representation is imperative to highly accurate face recognition.

Face Recognition

Geometry-aware Similarity Learning on SPD Manifolds for Visual Recognition

no code implementations17 Aug 2016 Zhiwu Huang, Ruiping Wang, Xianqiu Li, Wenxian Liu, Shiguang Shan, Luc van Gool, Xilin Chen

Specifically, by exploiting the Riemannian geometry of the manifold of fixed-rank Positive Semidefinite (PSD) matrices, we present a new solution to reduce optimizing over the space of column full-rank transformation matrices to optimizing on the PSD manifold which has a well-established Riemannian structure.

Cross Euclidean-to-Riemannian Metric Learning with Application to Face Recognition from Video

no code implementations15 Aug 2016 Zhiwu Huang, Ruiping Wang, Shiguang Shan, Luc van Gool, Xilin Chen

With this mapping, the problem of learning a cross-view metric between the two source heterogeneous spaces can be expressed as learning a single-view Euclidean distance metric in the target common Euclidean space.

Face Recognition Metric Learning

Dual Purpose Hashing

no code implementations19 Jul 2016 Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

Recent years have seen more and more demand for a unified framework to address multiple realistic image retrieval tasks concerning both category and attributes.

Attribute Image Retrieval +1

Deep Supervised Hashing for Fast Image Retrieval

1 code implementation CVPR 2016 Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

In this paper, we present a new hashing method to learn compact binary codes for highly efficient image retrieval on large-scale datasets.

Image Retrieval Retrieval

Multi-View Deep Network for Cross-View Classification

no code implementations CVPR 2016 Meina Kan, Shiguang Shan, Xilin Chen

As a result, the representation from the topmost layers of the MvDN network is robust to view discrepancy, and also discriminative.

Classification Face Recognition +1

Occlusion-Free Face Alignment: Deep Regression Networks Coupled With De-Corrupt AutoEncoders

no code implementations CVPR 2016 Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen

Face alignment or facial landmark detection plays an important role in many computer vision applications, e. g., face recognition, facial expression recognition, face animation, etc.

Face Alignment Face Recognition +4

Bi-Shifting Auto-Encoder for Unsupervised Domain Adaptation

no code implementations ICCV 2015 Meina Kan, Shiguang Shan, Xilin Chen

To alleviate the discrepancy between source and target domains, we propose a domain adaptation method, named as Bi-shifting Auto-Encoder network (BAE).

Face Recognition Unsupervised Domain Adaptation

Leveraging Datasets With Varying Annotations for Face Alignment via Deep Regression Network

no code implementations ICCV 2015 Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen

Facial landmark detection, as a vital topic in computer vision, has been studied for many decades and lots of datasets have been collected for evaluation.

Face Alignment Facial Landmark Detection +1

Two Birds, One Stone: Jointly Learning Binary Code for Large-Scale Face Image Retrieval and Attributes Prediction

no code implementations ICCV 2015 Yan Li, Ruiping Wang, Haomiao Liu, Huajie Jiang, Shiguang Shan, Xilin Chen

In this way, the learned binary codes can be applied to not only fine-grained face image retrieval, but also facial attributes prediction, which is the very innovation of this work, just like killing two birds with one stone.

Face Image Retrieval Retrieval

Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition

no code implementations16 Nov 2015 Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen

3) the local modes on each STM can be instantiated by fitting to UMM, and the corresponding expressionlet is constructed by modeling the variations in each local mode.

Dynamic Facial Expression Recognition Facial Expression Recognition +1

Learning Mid-level Words on Riemannian Manifold for Action Recognition

no code implementations16 Nov 2015 Mengyi Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

Human action recognition remains a challenging task due to the various sources of video data and large intra-class variations.

Action Recognition Clustering +1

AgeNet: Deeply Learned Regressor and Classifier for Robust Apparent Age Estimation

no code implementations ICCV Workshop 2015 Xin Liu, Shaoxin Li, Meina Kan, Jie Zhang, Shuzhe Wu, Wenxian Liu, Hu Han, Shiguang Shan, Xilin Chen

Another key feature of the proposed AgeNet is that, to avoid the problem of over-fitting on small apparent age training set, we exploit a general-to-specific transfer learning scheme.

Age Estimation Transfer Learning

Deep Trans-layer Unsupervised Networks for Representation Learning

no code implementations27 Sep 2015 Wentao Zhu, Jun Miao, Laiyun Qing, Xilin Chen

Compared to traditional deep learning methods, the implemented feature learning method has much less parameters and is validated in several typical experiments, such as digit recognition on MNIST and MNIST variations, object recognition on Caltech 101 dataset and face verification on LFW dataset.

Face Verification Object Recognition +1

Cross-pose Face Recognition by Canonical Correlation Analysis

no code implementations29 Jul 2015 Annan Li, Shiguang Shan, Xilin Chen, Bingpeng Ma, Shuicheng Yan, Wen Gao

We argue that one of the diffculties in this problem is the severe misalignment in face images or feature vectors with different poses.

Face Recognition

Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets

no code implementations CVPR 2015 Wen Wang, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen

This paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG) to solve the problem of face recognition with image sets.

Face Identification Face Recognition +1

Projection Metric Learning on Grassmann Manifold With Application to Video Based Face Recognition

no code implementations CVPR 2015 Zhiwu Huang, Ruiping Wang, Shiguang Shan, Xilin Chen

In video based face recognition, great success has been made by representing videos as linear subspaces, which typically lie in a special type of non-Euclidean space known as Grassmann manifold.

Dimensionality Reduction Face Recognition +1

Face Video Retrieval With Image Query via Hashing Across Euclidean Space and Riemannian Manifold

no code implementations CVPR 2015 Yan Li, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen

Retrieving videos of a specific person given his/her face image as query becomes more and more appealing for applications like smart movie fast-forwards and suspect searching.

Retrieval Video Retrieval

Generalized Unsupervised Manifold Alignment

no code implementations NeurIPS 2014 Zhen Cui, Hong Chang, Shiguang Shan, Xilin Chen

In this paper, we propose a generalized Unsupervised Manifold Alignment (GUMA) method to build the connections between different but correlated datasets without any known correspondences.

Learning Euclidean-to-Riemannian Metric for Point-to-Set Classification

no code implementations CVPR 2014 Zhiwu Huang, Ruiping Wang, Shiguang Shan, Xilin Chen

Since the points commonly lie in Euclidean space while the sets are typically modeled as elements on Riemannian manifold, they can be treated as Euclidean points and Riemannian points respectively.

Classification General Classification +1

Stacked Progressive Auto-Encoders (SPAE) for Face Recognition Across Poses

no code implementations CVPR 2014 Meina Kan, Shiguang Shan, Hong Chang, Xilin Chen

Identifying subjects with variations caused by poses is one of the most challenging tasks in face recognition, since the difference in appearances caused by poses may be even larger than the difference due to identity.

Face Recognition Pose Estimation

Deeply Coupled Auto-encoder Networks for Cross-view Classification

no code implementations10 Feb 2014 Wen Wang, Zhen Cui, Hong Chang, Shiguang Shan, Xilin Chen

In this paper, we propose a simple but effective coupled neural network, called Deeply Coupled Autoencoder Networks (DCAN), which seeks to build two deep neural networks, coupled with each other in every corresponding layers.

Classification Denoising +2

Cannot find the paper you are looking for? You can Submit a new open access paper.