Search Results for author: Si Liu

Found 86 papers, 42 papers with code

Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation

no code implementations18 Sep 2023 Shaofei Huang, Han Li, Yuqing Wang, Hongji Zhu, Jiao Dai, Jizhong Han, Wenge Rong, Si Liu

Explicit object-level semantic correspondence between audio and visual modalities is established by gathering object information from visual features with predefined audio queries.

Semantic correspondence

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

1 code implementation11 Sep 2023 Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao

Domain shifts such as sensor type changes and geographical situation variations are prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on the previous-domain knowledge can be hardly directly deployed to a new domain without additional costs.

Autonomous Driving Domain Generalization

Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

no code implementations31 Aug 2023 Si Liu, Chen Gao, Yuan Chen, Xingyu Peng, Xianghao Kong, Kun Wang, Runsheng Xu, Wentao Jiang, Hao Xiang, Jiaqi Ma, Miao Wang

Specifically, we analyze the performance changes of different methods under different bandwidths, providing a deep insight into the performance-bandwidth trade-off issue.

Autonomous Driving

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

no code implementations20 Aug 2023 Jinyu Chen, Wenguan Wang, Si Liu, Hongsheng Li, Yi Yang

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Decision Making Transfer Learning +1

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

no code implementations5 Aug 2023 Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan

To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation.

Representation Learning Super-Resolution

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

no code implementations29 Jun 2023 Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.

Language Modelling Large Language Model +2

A Tale of Two Approximations: Tightening Over-Approximation for DNN Robustness Verification via Under-Approximation

no code implementations26 May 2023 Zhiyi Xue, Si Liu, Zhaodi Zhang, Yiting Wu, Min Zhang

In this paper, we study existing approaches and identify a dominant factor in defining tight approximation, namely the approximation domain of the activation function.

DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection

no code implementations CVPR 2023 Zongheng Tang, Yifan Sun, Si Liu, Yi Yang

Second, through our design, the object queries and the foreground query in the decoder share consensus on the class semantics, therefore making the strong and weak supervision mutually benefit each other for domain alignment.

object-detection Weakly Supervised Object Detection

Sparse Dense Fusion for 3D Object Detection

no code implementations9 Apr 2023 Yulu Gao, Chonghao Sima, Shaoshuai Shi, Shangzhe Di, Si Liu, Hongyang Li

With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection.

3D Object Detection object-detection

Boosting Verified Training for Robust Image Classifications via Abstraction

1 code implementation CVPR 2023 Zhaodi Zhang, Zhiyi Xue, Yang Chen, Si Liu, Yueling Zhang, Jing Liu, Min Zhang

Via abstraction, all perturbed images are mapped into intervals before feeding into neural networks for training.

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

1 code implementation CVPR 2023 Luting Wang, Yi Liu, Penghui Du, Zihan Ding, Yue Liao, Qiaosong Qi, Biaolong Chen, Si Liu

When extracting object knowledge from PVLMs, the former adaptively transforms object proposals and adopts object-aware mask attention to obtain precise and complete knowledge of objects.

Open Vocabulary Object Detection

Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection

1 code implementation CVPR 2023 Shaofei Huang, Zhenwei Shen, Zehao Huang, Zi-han Ding, Jiao Dai, Jizhong Han, Naiyan Wang, Si Liu

An attempt has been made to get rid of BEV and predict 3D lanes from FV representations directly, while it still underperforms other BEV-based methods given its lack of structured representation for 3D lanes.

3D Lane Detection

Object as Query: Lifting any 2D Object Detector to 3D Detection

no code implementations6 Jan 2023 Zitian Wang, Zehao Huang, Jiahui Fu, Naiyan Wang, Si Liu

For the generated queries, we design a sparse cross attention module to force them to focus on the features of specific objects, which suppresses interference from noises.

3D Object Detection object-detection

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

1 code implementation CVPR 2023 Chen Gao, Xingyu Peng, Mi Yan, He Wang, Lirong Yang, Haibing Ren, Hongsheng Li, Si Liu

In this paper, we propose an Adaptive Zone-aware Hierarchical Planner (AZHP) to explicitly divides the navigation process into two heterogeneous phases, i. e., sub-goal setting via zone partition/selection (high-level action) and sub-goal executing (low-level action), for hierarchical planning.

Vision-Language Navigation

Bridging Search Region Interaction With Template for RGB-T Tracking

1 code implementation CVPR 2023 Tianrui Hui, Zizheng Xun, Fengguang Peng, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts.

Rgb-T Tracking Template Matching

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

1 code implementation2 Dec 2022 Fangxun Shu, Biaolong Chen, Yue Liao, Shuwen Xiao, Wenyu Sun, Xiaobo Li, Yousong Zhu, Jinqiao Wang, Si Liu

Our MAC aims to reduce video representation's spatial and temporal redundancy in the VidLP model by a mask sampling mechanism to improve pre-training efficiency.

Ranked #31 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

Analyzing Infrastructure LiDAR Placement with Realistic LiDAR Simulation Library

1 code implementation29 Nov 2022 Xinyu Cai, Wentao Jiang, Runsheng Xu, Wenquan Zhao, Jiaqi Ma, Si Liu, Yikang Li

Through simulating point cloud data in different LiDAR placements, we can evaluate the perception accuracy of these placements using multiple detection models.

Teach-DETR: Better Training DETR with Teachers

1 code implementation22 Nov 2022 Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu, Hongsheng Li

In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors.

Video Background Music Generation: Dataset, Method and Evaluation

1 code implementation21 Nov 2022 Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation.

Music Generation Representation Learning +1

DualApp: Tight Over-Approximation for Neural Network Robustness Verification via Under-Approximation

no code implementations21 Nov 2022 Yiting Wu, Zhaodi Zhang, Zhiyi Xue, Si Liu, Min Zhang

We observe that existing approaches only rely on overestimated domains, while the corresponding tight approximation may not necessarily be tight on its actual domain.

Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline

no code implementations6 Oct 2022 Yuanbin Wang, Leyan Zhu, Shaofei Huang, Tianrui Hui, Xiaojie Li, Fei Wang, Si Liu

To better bridge the domain gap between source domain (synthetic data) and target domain (real-world data), we also propose a Selective Feature Alignment (SFA) module which only aligns the features of consistent foreground area between the two domains, thus realizing inter-domain intra-modality adaptation.

Autonomous Driving Semantic Segmentation +1

Multi-view Human Body Mesh Translator

no code implementations4 Oct 2022 Xiangjian Jiang, Xuecheng Nie, Zitian Wang, Luoqi Liu, Si Liu

Existing methods for human mesh recovery mainly focus on single-view frameworks, but they often fail to produce accurate results due to the ill-posed setup.

Human Mesh Recovery

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations12 Sep 2022 Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dhaka Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural Networks

no code implementations21 Aug 2022 Zhaodi Zhang, Yiting Wu, Si Liu, Jing Liu, Min Zhang

Considerable efforts have been devoted to finding the so-called tighter approximations to obtain more precise verification results.

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

1 code implementation11 Aug 2022 Zihan Ding, Zi-han Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Si Liu

To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination.

Panoptic Segmentation Semantic correspondence

Target-Driven Structured Transformer Planner for Vision-Language Navigation

1 code implementation19 Jul 2022 Yusheng Zhao, Jinyu Chen, Chen Gao, Wenguan Wang, Lirong Yang, Haibing Ren, Huaxia Xia, Si Liu

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.

Navigate Vision-Language Navigation

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

1 code implementation12 Jul 2022 Luting Wang, Xiaojie Li, Yue Liao, Zeren Jiang, Jianlong Wu, Fei Wang, Chen Qian, Si Liu

We observe that the core difficulty for heterogeneous KD (hetero-KD) is the significant semantic gap between the backbone features of heterogeneous detectors due to the different optimization manners.

Knowledge Distillation object-detection +2

Reinforced Structured State-Evolution for Vision-Language Navigation

1 code implementation CVPR 2022 Jinyu Chen, Chen Gao, Erli Meng, Qiong Zhang, Si Liu

However, the crucial navigation clues (i. e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured.

Navigate Vision and Language Navigation +1

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

1 code implementation CVPR 2022 Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen, Huaxia Xia, Si Liu

3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description.

Visual Grounding

TR-MOT: Multi-Object Tracking by Reference

no code implementations30 Mar 2022 Mingfei Chen, Yue Liao, Si Liu, Fei Wang, Jenq-Neng Hwang

RS takes previous detected results as references to aggregate the corresponding features from the combined features of the adjacent frames and makes a one-to-one track state prediction for each reference in parallel.

Multi-Object Tracking

Improved Pillar with Fine-grained Feature for 3D Object Detection

no code implementations12 Oct 2021 Jiahui Fu, Guanghui Ren, Yunpeng Chen, Si Liu

In contrast, the 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution, but it is hard to get the competitive accuracy limited by the coarse-grained point clouds representation.

3D Object Detection Autonomous Driving +1

TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

no code implementations5 Aug 2021 Dailan He, Yusheng Zhao, Junyu Luo, Tianrui Hui, Shaofei Huang, Aixi Zhang, Si Liu

Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents.

Visual Grounding

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

1 code implementation CVPR 2021 Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu

The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction.

Instruction Following Navigate +1

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

1 code implementation8 Jun 2021 MingJie Sun, Jimin Xiao, Eng Gee Lim, Si Liu, John Y. Goulermas

In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage.

Referring Expression

PSGAN++: Robust Detail-Preserving Makeup Transfer and Removal

1 code implementation26 May 2021 Si Liu, Wentao Jiang, Chen Gao, Ran He, Jiashi Feng, Bo Li, Shuicheng Yan

In this paper, we address the makeup transfer and removal tasks simultaneously, which aim to transfer the makeup from a reference image to a source image and remove the makeup from the with-makeup image respectively.

Style Transfer

Human-centric Relation Segmentation: Dataset and Solution

no code implementations24 May 2021 Si Liu, Zitian Wang, Yulu Gao, Lejian Ren, Yue Liao, Guanghui Ren, Bo Li, Shuicheng Yan

For the above exemplar case, our HRS task produces results in the form of relation triplets <girl [left hand], hold, book> and exacts segmentation masks of the book, with which the robot can easily accomplish the grabbing task.

Cross-Modal Progressive Comprehension for Referring Segmentation

1 code implementation15 May 2021 Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li

In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models.

Image Segmentation Referring Expression Segmentation +3

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

no code implementations CVPR 2021 Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.

feature selection Referring Expression Segmentation

Reformulating HOI Detection as Adaptive Set Prediction

1 code implementation CVPR 2021 Mingfei Chen, Yue Liao, Si Liu, ZhiYuan Chen, Fei Wang, Chen Qian

To attain this, we map a trainable interaction query set to an interaction prediction set with a transformer.

Ranked #27 on Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

1 code implementation CVPR 2021 Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.

Human Parsing Multi-Person Pose Estimation +3

Video Relation Detection with Trajectory-aware Multi-modal Features

no code implementations20 Jan 2021 Wentao Xie, Guanghui Ren, Si Liu

Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction.

object-detection Object Detection +1

ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

no code implementations11 Jan 2021 Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, Shuicheng Yan

Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images.

Scene Parsing

Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism

no code implementations ICCV 2021 Wentao Jiang, Ning Xu, Jiayun Wang, Chen Gao, Jing Shi, Zhe Lin, Si Liu

Given the cycle, we propose several free augmentation strategies to help our model understand various editing requests given the imbalanced dataset.

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

1 code implementation7 Dec 2020 Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens.

Image Captioning Optical Character Recognition +1

Referring Image Segmentation via Cross-Modal Progressive Comprehension

1 code implementation CVPR 2020 Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.

Image Segmentation Referring Expression Segmentation +1

Recapture as You Want

no code implementations2 Jun 2020 Chen Gao, Si Liu, Ran He, Shuicheng Yan, Bo Li

LGR module utilizes body skeleton knowledge to construct a layout graph that connects all relevant part features, where graph reasoning mechanism is used to propagate information among part nodes to mine their relations.

AdversarialNAS: Adversarial Neural Architecture Search for GANs

1 code implementation CVPR 2020 Chen Gao, Yunpeng Chen, Si Liu, Zhenxiong Tan, Shuicheng Yan

In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation.

Image Generation Neural Architecture Search +1

Rule-Guided Compositional Representation Learning on Knowledge Graphs

1 code implementation20 Nov 2019 Guanglin Niu, Yongfei Zhang, Bo Li, Peng Cui, Si Liu, Jingyang Li, Xiaowei Zhang

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces.

Knowledge Graphs Representation Learning

Learning to Recognize the Unseen Visual Predicates

no code implementations25 Sep 2019 Defa Zhu, Si Liu, Wentao Jiang, Guanbin Li, Tianyi Wu, Guodong Guo

Visual relationship recognition models are limited in the ability to generalize from finite seen predicates to unseen ones.

Question Answering Visual Question Answering +1

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

1 code implementation CVPR 2020 Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, Shuicheng Yan

In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image.

UGAN: Untraceable GAN for Multi-Domain Face Translation

no code implementations26 Jul 2019 Defa Zhu, Si Liu, Wentao Jiang, Chen Gao, Tianyi Wu, Qaingchang Wang, Guodong Guo

To address this issue, we propose a method called Untraceable GAN, which has a novel source classifier to differentiate which domain an image is translated from, and determines whether the translated image still retains the characteristics of the source domain.

Image-to-Image Translation Translation

Accurate facial image parsing at real-time speed

no code implementations IEEE Transactions on Image Processing 2019 Zhen Wei, Si Liu, Yao Sun, Hefei Ling

In this paper, we propose a design scheme for deep learning networks in the face parsing task with promising accuracy and real-time inference speed.

Face Parsing

Open Category Detection with PAC Guarantees

1 code implementation ICML 2018 Si Liu, Risheek Garrepalli, Thomas G. Dietterich, Alan Fern, Dan Hendrycks

Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates.

Face Aging with Contextual Generative Adversarial Nets

no code implementations1 Feb 2018 Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, Shuicheng Yan

The age discriminative network guides the synthesized face to fit the real conditional distribution.

Face Verification

Cross-domain Human Parsing via Adversarial Feature and Label Adaptation

no code implementations4 Jan 2018 Si Liu, Yao Sun, Defa Zhu, Guanghui Ren, Yu Chen, Jiashi Feng, Jizhong Han

Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences.

Human Parsing

Fast Deep Matting for Portrait Animation on Mobile Phone

1 code implementation26 Jul 2017 Bingke Zhu, Yingying Chen, Jinqiao Wang, Si Liu, Bo Zhang, Ming Tang

Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps.

Image Matting Video Editing

Learning Adaptive Receptive Fields for Deep Image Parsing Network

no code implementations CVPR 2017 Zhen Wei, Yao Sun, Jinqiao Wang, Hanjiang Lai, Si Liu

In this paper, we introduce a novel approach to regulate receptive field in deep image parsing network automatically.

Face Parsing

Surveillance Video Parsing with Single Frame Supervision

no code implementations CVPR 2017 Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao

In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage.

Optical Flow Estimation

SketchNet: Sketch Classification With Web Images

no code implementations CVPR 2016 Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, Xiaochun Cao

In this study, we present a weakly supervised approach that discovers the discriminative structures of sketch images, given pairs of sketch images and web images.

Classification General Classification

Structural Correlation Filter for Robust Visual Tracking

no code implementations CVPR 2016 Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu

In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking.

Visual Tracking

Makeup like a superstar: Deep Localized Makeup Transfer Network

no code implementations25 Apr 2016 Si Liu, Xinyu Ou, Ruihe Qian, Wei Wang, Xiaochun Cao

In this paper, we propose a novel Deep Localized Makeup Transfer Network to automatically recommend the most suitable makeup for a female and synthesis the makeup on her face.

Low-Rank Tensor Constrained Multiview Subspace Clustering

no code implementations ICCV 2015 Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, Xiaochun Cao

We introduce a low-rank tensor constraint to explore the complementary information from multiple views and, accordingly, establish a novel method called Low-rank Tensor constrained Multiview Subspace Clustering (LT-MSC).


Human Parsing With Contextualized Convolutional Neural Network

no code implementations ICCV 2015 Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

Human Parsing

Diversity-Induced Multi-View Subspace Clustering

no code implementations CVPR 2015 Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, Hua Zhang

In this paper, we focus on how to boost the multi-view clustering by exploring the complementary information among multi-view features.

Clustering Face Clustering +1

Structural Sparse Tracking

no code implementations CVPR 2015 Tianzhu Zhang, Si Liu, Changsheng Xu, Shuicheng Yan, Bernard Ghanem, Narendra Ahuja, Ming-Hsuan Yang

Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error by use of target templates.

Visual Tracking

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

no code implementations CVPR 2015 Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan

Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.

Human Parsing

Deep Human Parsing with Active Template Regression

1 code implementation9 Mar 2015 Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, Shuicheng Yan

The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters.

Human Parsing regression

Cannot find the paper you are looking for? You can Submit a new open access paper.