Reinforced Structured State-Evolution for Vision-Language Navigation

no code implementations20 Apr 2022 Jinyu Chen, Chen Gao, Erli Meng, Qiong Zhang, Si Liu

However, the crucial navigation clues (i. e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured.

Vision and Language Navigation Vision-Language Navigation

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

no code implementations13 Apr 2022 Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen, Huaxia Xia, Si Liu

3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description.

Visual Grounding

TR-MOT: Multi-Object Tracking by Reference

no code implementations30 Mar 2022 Mingfei Chen, Yue Liao, Si Liu, Fei Wang, Jenq-Neng Hwang

RS takes previous detected results as references to aggregate the corresponding features from the combined features of the adjacent frames and makes a one-to-one track state prediction for each reference in parallel.

Frame Multi-Object Tracking

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

no code implementations15 Mar 2022 Zitian Wang, Xuecheng Nie, Xiaochao Qu, Yunpeng Chen, Si Liu

In this paper, we present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem.

3D Pose Estimation

Improved Pillar with Fine-grained Feature for 3D Object Detection

no code implementations12 Oct 2021 Jiahui Fu, Guanghui Ren, Yunpeng Chen, Si Liu

In contrast, the 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution, but it is hard to get the competitive accuracy limited by the coarse-grained point clouds representation.

3D Object Detection Autonomous Driving

TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

no code implementations5 Aug 2021 Dailan He, Yusheng Zhao, Junyu Luo, Tianrui Hui, Shaofei Huang, Aixi Zhang, Si Liu

Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents.

Visual Grounding

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

1 code implementation CVPR 2021 Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu

The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction.

Referring Expression

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

1 code implementation8 Jun 2021 MingJie Sun, Jimin Xiao, Eng Gee Lim, Si Liu, John Y. Goulermas

In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage.

Referring Expression

PSGAN++: Robust Detail-Preserving Makeup Transfer and Removal

1 code implementation26 May 2021 Si Liu, Wentao Jiang, Chen Gao, Ran He, Jiashi Feng, Bo Li, Shuicheng Yan

In this paper, we address the makeup transfer and removal tasks simultaneously, which aim to transfer the makeup from a reference image to a source image and remove the makeup from the with-makeup image respectively.

Style Transfer

Human-centric Relation Segmentation: Dataset and Solution

no code implementations24 May 2021 Si Liu, Zitian Wang, Yulu Gao, Lejian Ren, Yue Liao, Guanghui Ren, Bo Li, Shuicheng Yan

For the above exemplar case, our HRS task produces results in the form of relation triplets <girl [left hand], hold, book> and exacts segmentation masks of the book, with which the robot can easily accomplish the grabbing task.

Cross-Modal Progressive Comprehension for Referring Segmentation

1 code implementation15 May 2021 Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li

In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models.

Referring Expression Segmentation Semantic Segmentation +2

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

no code implementations CVPR 2021 Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.

Frame Referring Expression Segmentation

Reformulating HOI Detection as Adaptive Set Prediction

1 code implementation CVPR 2021 Mingfei Chen, Yue Liao, Si Liu, ZhiYuan Chen, Fei Wang, Chen Qian

To attain this, we map a trainable interaction query set to an interaction prediction set with a transformer.

Ranked #12 on Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

1 code implementation CVPR 2021 Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.

Human Parsing Multi-Person Pose Estimation +3

Video Relation Detection with Trajectory-aware Multi-modal Features

no code implementations20 Jan 2021 Wentao Xie, Guanghui Ren, Si Liu

Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction.

Object Detection

ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

no code implementations11 Jan 2021 Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, Shuicheng Yan

Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images.

Scene Parsing

Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism

no code implementations ICCV 2021 Wentao Jiang, Ning Xu, Jiayun Wang, Chen Gao, Jing Shi, Zhe Lin, Si Liu

Given the cycle, we propose several free augmentation strategies to help our model understand various editing requests given the imbalanced dataset.

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

1 code implementation7 Dec 2020 Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens.

Image Captioning Optical Character Recognition

Referring Image Segmentation via Cross-Modal Progressive Comprehension

1 code implementation CVPR 2020 Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.

Referring Expression Segmentation Semantic Segmentation

Recapture as You Want

no code implementations2 Jun 2020 Chen Gao, Si Liu, Ran He, Shuicheng Yan, Bo Li

LGR module utilizes body skeleton knowledge to construct a layout graph that connects all relevant part features, where graph reasoning mechanism is used to propagate information among part nodes to mine their relations.

AdversarialNAS: Adversarial Neural Architecture Search for GANs

1 code implementation CVPR 2020 Chen Gao, Yunpeng Chen, Si Liu, Zhenxiong Tan, Shuicheng Yan

In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation.

Image Generation Neural Architecture Search +1

Rule-Guided Compositional Representation Learning on Knowledge Graphs

1 code implementation20 Nov 2019 Guanglin Niu, Yongfei Zhang, Bo Li, Peng Cui, Si Liu, Jingyang Li, Xiaowei Zhang

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces.

Knowledge Graphs Representation Learning

Learning to Recognize the Unseen Visual Predicates

no code implementations25 Sep 2019 Defa Zhu, Si Liu, Wentao Jiang, Guanbin Li, Tianyi Wu, Guodong Guo

Visual relationship recognition models are limited in the ability to generalize from finite seen predicates to unseen ones.

Question Answering Visual Question Answering +1

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

1 code implementation CVPR 2020 Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, Shuicheng Yan

In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image.

UGAN: Untraceable GAN for Multi-Domain Face Translation

no code implementations26 Jul 2019 Defa Zhu, Si Liu, Wentao Jiang, Chen Gao, Tianyi Wu, Qaingchang Wang, Guodong Guo

To address this issue, we propose a method called Untraceable GAN, which has a novel source classifier to differentiate which domain an image is translated from, and determines whether the translated image still retains the characteristics of the source domain.

Image-to-Image Translation Translation

Accurate facial image parsing at real-time speed

no code implementations IEEE Transactions on Image Processing 2019 Zhen Wei, Si Liu, Yao Sun, Hefei Ling

In this paper, we propose a design scheme for deep learning networks in the face parsing task with promising accuracy and real-time inference speed.

Face Parsing

Open Category Detection with PAC Guarantees

1 code implementation ICML 2018 Si Liu, Risheek Garrepalli, Thomas G. Dietterich, Alan Fern, Dan Hendrycks

Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates.

Face Aging with Contextual Generative Adversarial Nets

no code implementations1 Feb 2018 Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, Shuicheng Yan

The age discriminative network guides the synthesized face to fit the real conditional distribution.

Face Verification

Cross-domain Human Parsing via Adversarial Feature and Label Adaptation

no code implementations4 Jan 2018 Si Liu, Yao Sun, Defa Zhu, Guanghui Ren, Yu Chen, Jiashi Feng, Jizhong Han

Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences.

Human Parsing

Fast Deep Matting for Portrait Animation on Mobile Phone

1 code implementation26 Jul 2017 Bingke Zhu, Yingying Chen, Jinqiao Wang, Si Liu, Bo Zhang, Ming Tang

Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps.

Image Matting

Learning Adaptive Receptive Fields for Deep Image Parsing Network

no code implementations CVPR 2017 Zhen Wei, Yao Sun, Jinqiao Wang, Hanjiang Lai, Si Liu

In this paper, we introduce a novel approach to regulate receptive field in deep image parsing network automatically.

Face Parsing

Surveillance Video Parsing with Single Frame Supervision

no code implementations CVPR 2017 Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao

In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage.

Frame Optical Flow Estimation

Structural Correlation Filter for Robust Visual Tracking

no code implementations CVPR 2016 Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu

In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking.

Visual Tracking

SketchNet: Sketch Classification With Web Images

no code implementations CVPR 2016 Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, Xiaochun Cao

In this study, we present a weakly supervised approach that discovers the discriminative structures of sketch images, given pairs of sketch images and web images.

Classification General Classification

Makeup like a superstar: Deep Localized Makeup Transfer Network

no code implementations25 Apr 2016 Si Liu, Xinyu Ou, Ruihe Qian, Wei Wang, Xiaochun Cao

In this paper, we propose a novel Deep Localized Makeup Transfer Network to automatically recommend the most suitable makeup for a female and synthesis the makeup on her face.

Low-Rank Tensor Constrained Multiview Subspace Clustering

no code implementations ICCV 2015 Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, Xiaochun Cao

We introduce a low-rank tensor constraint to explore the complementary information from multiple views and, accordingly, establish a novel method called Low-rank Tensor constrained Multiview Subspace Clustering (LT-MSC).

Human Parsing With Contextualized Convolutional Neural Network

no code implementations ICCV 2015 Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

Human Parsing

Diversity-Induced Multi-View Subspace Clustering

no code implementations CVPR 2015 Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, Hua Zhang

In this paper, we focus on how to boost the multi-view clustering by exploring the complementary information among multi-view features.

Face Clustering Multi-view Subspace Clustering

Structural Sparse Tracking

no code implementations CVPR 2015 Tianzhu Zhang, Si Liu, Changsheng Xu, Shuicheng Yan, Bernard Ghanem, Narendra Ahuja, Ming-Hsuan Yang

Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error by use of target templates.

Visual Tracking

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

no code implementations CVPR 2015 Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan

Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.

Human Parsing

Deep Human Parsing with Active Template Regression

1 code implementation9 Mar 2015 Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, Shuicheng Yan

The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters.

Human Parsing

Computational Baby Learning

no code implementations11 Nov 2014 Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan

Then the concept detector can be fine-tuned based on these new instances.

Object Detection

