Search Results for author: Si Liu

Found 101 papers, 51 papers with code

Computational Baby Learning

no code implementations • 11 Nov 2014 • Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan

Then the concept detector can be fine-tuned based on these new instances.

Paper
Add Code

Deep Human Parsing with Active Template Regression

1 code implementation • 9 Mar 2015 • Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, Shuicheng Yan

The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters.

Human Parsing Position +1

153

Paper
Code

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

no code implementations • CVPR 2015 • Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan

Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.

Human Parsing

Paper
Add Code

Diversity-Induced Multi-View Subspace Clustering

no code implementations • CVPR 2015 • Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, Hua Zhang

In this paper, we focus on how to boost the multi-view clustering by exploring the complementary information among multi-view features.

Clustering Face Clustering +1

Paper
Add Code

Structural Sparse Tracking

no code implementations • CVPR 2015 • Tianzhu Zhang, Si Liu, Changsheng Xu, Shuicheng Yan, Bernard Ghanem, Narendra Ahuja, Ming-Hsuan Yang

Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error by use of target templates.

Visual Tracking

Paper
Add Code

Human Parsing With Contextualized Convolutional Neural Network

no code implementations • ICCV 2015 • Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

Human Parsing

Paper
Add Code

Low-Rank Tensor Constrained Multiview Subspace Clustering

no code implementations • ICCV 2015 • Changqing Zhang, Huazhu Fu, Si Liu, Guangcan Liu, Xiaochun Cao

We introduce a low-rank tensor constraint to explore the complementary information from multiple views and, accordingly, establish a novel method called Low-rank Tensor constrained Multiview Subspace Clustering (LT-MSC).

Clustering

Paper
Add Code

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection

no code implementations • ICCV 2015 • Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan

Then the concept detector can be fine-tuned based on these new instances.

object-detection Weakly Supervised Object Detection

Paper
Add Code

Makeup like a superstar: Deep Localized Makeup Transfer Network

no code implementations • 25 Apr 2016 • Si Liu, Xinyu Ou, Ruihe Qian, Wei Wang, Xiaochun Cao

In this paper, we propose a novel Deep Localized Makeup Transfer Network to automatically recommend the most suitable makeup for a female and synthesis the makeup on her face.

Paper
Add Code

Structural Correlation Filter for Robust Visual Tracking

no code implementations • CVPR 2016 • Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu

In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking.

Computational Efficiency Visual Tracking

Paper
Add Code

SketchNet: Sketch Classification With Web Images

no code implementations • CVPR 2016 • Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, Xiaochun Cao

In this study, we present a weakly supervised approach that discovers the discriminative structures of sketch images, given pairs of sketch images and web images.

Classification General Classification

Paper
Add Code

Surveillance Video Parsing with Single Frame Supervision

no code implementations • CVPR 2017 • Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao

In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage.

Optical Flow Estimation

Paper
Add Code

Learning Adaptive Receptive Fields for Deep Image Parsing Network

no code implementations • CVPR 2017 • Zhen Wei, Yao Sun, Jinqiao Wang, Hanjiang Lai, Si Liu

In this paper, we introduce a novel approach to regulate receptive field in deep image parsing network automatically.

Face Parsing

Paper
Add Code

Fast Deep Matting for Portrait Animation on Mobile Phone

1 code implementation • 26 Jul 2017 • Bingke Zhu, Yingying Chen, Jinqiao Wang, Si Liu, Bo Zhang, Ming Tang

Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps.

Image Matting Video Editing

Paper
Code

Cross-domain Human Parsing via Adversarial Feature and Label Adaptation

no code implementations • 4 Jan 2018 • Si Liu, Yao Sun, Defa Zhu, Guanghui Ren, Yu Chen, Jiashi Feng, Jizhong Han

Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences.

Human Parsing

Paper
Add Code

Face Aging with Contextual Generative Adversarial Nets

no code implementations • 1 Feb 2018 • Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, Shuicheng Yan

The age discriminative network guides the synthesized face to fit the real conditional distribution.

Face Verification

Paper
Add Code

Ensemble Soft-Margin Softmax Loss for Image Classification

no code implementations • 10 May 2018 • Xiaobo Wang, Shifeng Zhang, Zhen Lei, Si Liu, Xiaojie Guo, Stan Z. Li

On the other hand, the learned classifier of softmax loss is weak.

Classification General Classification +1

Paper
Add Code

Open Category Detection with PAC Guarantees

1 code implementation • ICML 2018 • Si Liu, Risheek Garrepalli, Thomas G. Dietterich, Alan Fern, Dan Hendrycks

Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates.

Paper
Code

Accurate facial image parsing at real-time speed

no code implementations • IEEE Transactions on Image Processing 2019 • Zhen Wei, Si Liu, Yao Sun, Hefei Ling

In this paper, we propose a design scheme for deep learning networks in the face parsing task with promising accuracy and real-time inference speed.

Ranked #6 on Face Parsing on CelebAMask-HQ

Face Parsing

Paper
Add Code

UGAN: Untraceable GAN for Multi-Domain Face Translation

no code implementations • 26 Jul 2019 • Defa Zhu, Si Liu, Wentao Jiang, Chen Gao, Tianyi Wu, Qaingchang Wang, Guodong Guo

To address this issue, we propose a method called Untraceable GAN, which has a novel source classifier to differentiate which domain an image is translated from, and determines whether the translated image still retains the characteristics of the source domain.

Image-to-Image Translation Translation

Paper
Add Code

A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension

no code implementations • CVPR 2020 • Yue Liao, Si Liu, Guanbin Li, Fei Wang, Yanjie Chen, Chen Qian, Bo Li

RCCF reformulates the referring expression comprehension as a correlation filtering process.

Referring Expression Referring Expression Comprehension

Paper
Add Code

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

1 code implementation • CVPR 2020 • Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, Shuicheng Yan

In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image.

699

Paper
Code

Learning to Recognize the Unseen Visual Predicates

no code implementations • 25 Sep 2019 • Defa Zhu, Si Liu, Wentao Jiang, Guanbin Li, Tianyi Wu, Guodong Guo

Visual relationship recognition models are limited in the ability to generalize from finite seen predicates to unseen ones.

Question Answering Visual Question Answering +1

Paper
Add Code

RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment

1 code implementation • ICCV 2019 • Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, Zeng-Guang Hou

First, it can exploit pixel alignment and feature alignment jointly.

Cross-Modality Person Re-identification Generative Adversarial Network +2

117

Paper
Code

Rule-Guided Compositional Representation Learning on Knowledge Graphs

1 code implementation • 20 Nov 2019 • Guanglin Niu, Yongfei Zhang, Bo Li, Peng Cui, Si Liu, Jingyang Li, Xiaowei Zhang

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces.

Knowledge Graphs Representation Learning

Paper
Code

AdversarialNAS: Adversarial Neural Architecture Search for GANs

1 code implementation • CVPR 2020 • Chen Gao, Yunpeng Chen, Si Liu, Zhenxiong Tan, Shuicheng Yan

In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation.

Image Generation Neural Architecture Search +1

Paper
Code

PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection

1 code implementation • CVPR 2020 • Yue Liao, Si Liu, Fei Wang, Yanjie Chen, Chen Qian, Jiashi Feng

Human and object points are the center of the detection boxes, and the interaction point is the midpoint of the human and object points.

Ranked #25 on Human-Object Interaction Detection on V-COCO

Human-Object Interaction Detection Object +2

213

Paper
Code

Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video

1 code implementation • 18 Jan 2020 • Jie Wu, Guanbin Li, Si Liu, Liang Lin

Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.

Decision Making reinforcement-learning +2

Paper
Code

Recapture as You Want

no code implementations • 2 Jun 2020 • Chen Gao, Si Liu, Ran He, Shuicheng Yan, Bo Li

LGR module utilizes body skeleton knowledge to construct a layout graph that connects all relevant part features, where graph reasoning mechanism is used to propagate information among part nodes to mine their relations.

Paper
Add Code

Referring Image Segmentation via Cross-Modal Progressive Comprehension

1 code implementation • CVPR 2020 • Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.

Ranked #14 on Referring Expression Segmentation on RefCOCO testB

Attribute Image Segmentation +2

Paper
Code

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

1 code implementation • ECCV 2020 • Tianrui Hui, Si Liu, Shaofei Huang, Guanbin Li, Sansi Yu, Faxi Zhang, Jizhong Han

Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence.

Dependency Parsing Image Segmentation +3

Paper
Code

Human-centric Spatio-Temporal Video Grounding With Visual Transformers

1 code implementation • 10 Nov 2020 • Zongheng Tang, Yue Liao, Si Liu, Guanbin Li, Xiaojie Jin, Hongxu Jiang, Qian Yu, Dong Xu

HC-STVG is a video grounding task that requires both spatial (where) and temporal (when) localization.

Referring Expression Sentence +3

Paper
Code

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

1 code implementation • 7 Dec 2020 • Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens.

Image Captioning Optical Character Recognition +1

Paper
Code

Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism

no code implementations • ICCV 2021 • Wentao Jiang, Ning Xu, Jiayun Wang, Chen Gao, Jing Shi, Zhe Lin, Si Liu

Given the cycle, we propose several free augmentation strategies to help our model understand various editing requests given the imbalanced dataset.

Paper
Add Code

ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

no code implementations • 11 Jan 2021 • Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, Shuicheng Yan

Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images.

Scene Parsing

Paper
Add Code

Video Relation Detection with Trajectory-aware Multi-modal Features

no code implementations • 20 Jan 2021 • Wentao Xie, Guanghui Ren, Si Liu

Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction.

Object object-detection +3

Paper
Add Code

General Instance Distillation for Object Detection

1 code implementation • CVPR 2021 • Xing Dai, Zeren Jiang, Zhao Wu, Yiping Bao, Zhicheng Wang, Si Liu, Erjin Zhou

In recent years, knowledge distillation has been proved to be an effective solution for model compression.

Knowledge Distillation Model Compression +4

Paper
Code

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

1 code implementation • CVPR 2021 • Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.

Human Parsing Multi-Person Pose Estimation +3

Paper
Code

Reformulating HOI Detection as Adaptive Set Prediction

1 code implementation • CVPR 2021 • Mingfei Chen, Yue Liao, Si Liu, ZhiYuan Chen, Fei Wang, Chen Qian

To attain this, we map a trainable interaction query set to an interaction prediction set with a transformer.

Ranked #29 on Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection

Paper
Code

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

no code implementations • CVPR 2021 • Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.

Ranked #8 on Referring Expression Segmentation on J-HMDB

feature selection Referring Expression Segmentation

Paper
Add Code

Cross-Modal Progressive Comprehension for Referring Segmentation

1 code implementation • 15 May 2021 • Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li

In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models.

Ranked #7 on Referring Expression Segmentation on J-HMDB

Attribute Image Segmentation +5

Paper
Code

Human-centric Relation Segmentation: Dataset and Solution

no code implementations • 24 May 2021 • Si Liu, Zitian Wang, Yulu Gao, Lejian Ren, Yue Liao, Guanghui Ren, Bo Li, Shuicheng Yan

For the above exemplar case, our HRS task produces results in the form of relation triplets <girl [left hand], hold, book> and exacts segmentation masks of the book, with which the robot can easily accomplish the grabbing task.

Relation Segmentation

Paper
Add Code

PSGAN++: Robust Detail-Preserving Makeup Transfer and Removal

1 code implementation • 26 May 2021 • Si Liu, Wentao Jiang, Chen Gao, Ran He, Jiashi Feng, Bo Li, Shuicheng Yan

In this paper, we address the makeup transfer and removal tasks simultaneously, which aim to transfer the makeup from a reference image to a source image and remove the makeup from the with-makeup image respectively.

Style Transfer

699

Paper
Code

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

1 code implementation • 8 Jun 2021 • MingJie Sun, Jimin Xiao, Eng Gee Lim, Si Liu, John Y. Goulermas

In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage.

Referring Expression Sentence

Paper
Code

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

1 code implementation • CVPR 2021 • Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu

The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction.

Instruction Following Navigate +2

Paper
Code

TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

no code implementations • 5 Aug 2021 • Dailan He, Yusheng Zhao, Junyu Luo, Tianrui Hui, Shaofei Huang, Aixi Zhang, Si Liu

Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents.

Relation Sentence +1

Paper
Add Code

Mining the Benefits of Two-stage and One-stage HOI Detection

1 code implementation • NeurIPS 2021 • Aixi Zhang, Yue Liao, Si Liu, Miao Lu, Yongliang Wang, Chen Gao, Xiaobo Li

To this end, we propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner.

Ranked #7 on Human-Object Interaction Detection on V-COCO

Classification Human-Object Interaction Detection +5

Paper
Code

Improved Pillar with Fine-grained Feature for 3D Object Detection

no code implementations • 12 Oct 2021 • Jiahui Fu, Guanghui Ren, Yunpeng Chen, Si Liu

In contrast, the 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution, but it is hard to get the competitive accuracy limited by the coarse-grained point clouds representation.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

1 code implementation • CVPR 2022 • Zitian Wang, Xuecheng Nie, Xiaochao Qu, Yunpeng Chen, Si Liu

In this paper, we present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem.

Ranked #8 on 3D Multi-Person Pose Estimation (absolute) on MuPoTS-3D

3D Multi-Person Pose Estimation (absolute) 3D Multi-Person Pose Estimation (root-relative) +2

Paper
Code

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

1 code implementation • CVPR 2022 • Yue Liao, Aixi Zhang, Miao Lu, Yongliang Wang, Xiaobo Li, Si Liu

In this paper, we reveal and address the disadvantages of the conventional query-driven HOI detectors from the two aspects.

Ranked #12 on Human-Object Interaction Detection on HICO-DET

Human-Object Interaction Detection Position +1

Paper
Code

TR-MOT: Multi-Object Tracking by Reference

no code implementations • 30 Mar 2022 • Mingfei Chen, Yue Liao, Si Liu, Fei Wang, Jenq-Neng Hwang

RS takes previous detected results as references to aggregate the corresponding features from the combined features of the adjacent frames and makes a one-to-one track state prediction for each reference in parallel.

Multi-Object Tracking Object

Paper
Add Code

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

1 code implementation • CVPR 2022 • Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen, Huaxia Xia, Si Liu

3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description.

Visual Grounding

Paper
Code

Reinforced Structured State-Evolution for Vision-Language Navigation

1 code implementation • CVPR 2022 • Jinyu Chen, Chen Gao, Erli Meng, Qiong Zhang, Si Liu

However, the crucial navigation clues (i. e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured.

Navigate Vision and Language Navigation +1

Paper
Code

Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

1 code implementation • CVPR 2022 • Zihan Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Jizhong Han, Si Liu

Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos.

Ranked #6 on Referring Video Object Segmentation on MeViS

Denoising Referring Video Object Segmentation +2

Paper
Code

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

1 code implementation • 12 Jul 2022 • Luting Wang, Xiaojie Li, Yue Liao, Zeren Jiang, Jianlong Wu, Fei Wang, Chen Qian, Si Liu

We observe that the core difficulty for heterogeneous KD (hetero-KD) is the significant semantic gap between the backbone features of heterogeneous detectors due to the different optimization manners.

Knowledge Distillation Object +3

Paper
Code

Target-Driven Structured Transformer Planner for Vision-Language Navigation

1 code implementation • 19 Jul 2022 • Yusheng Zhao, Jinyu Chen, Chen Gao, Wenguan Wang, Lirong Yang, Haibing Ren, Huaxia Xia, Si Liu

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.

Navigate Vision-Language Navigation

Paper
Code

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

1 code implementation • 11 Aug 2022 • Zihan Ding, Zi-han Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Si Liu

To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination.

Panoptic Segmentation Segmentation +1

Paper
Code

PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

1 code implementation • 16 Aug 2022 • Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu

Human pose estimation aims to accurately estimate a wide variety of human poses.

Data Augmentation Pose Estimation

Paper
Code

Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural Networks

no code implementations • 21 Aug 2022 • Zhaodi Zhang, Yiting Wu, Si Liu, Jing Liu, Min Zhang

Considerable efforts have been devoted to finding the so-called tighter approximations to obtain more precise verification results.

Paper
Add Code

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

2,866

Paper
Code

Multi-view Human Body Mesh Translator

no code implementations • 4 Oct 2022 • Xiangjian Jiang, Xuecheng Nie, Zitian Wang, Luoqi Liu, Si Liu

Existing methods for human mesh recovery mainly focus on single-view frameworks, but they often fail to produce accurate results due to the ill-posed setup.

Human Mesh Recovery

Paper
Add Code

Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline

no code implementations • 6 Oct 2022 • Yuanbin Wang, Leyan Zhu, Shaofei Huang, Tianrui Hui, Xiaojie Li, Fei Wang, Si Liu

To better bridge the domain gap between source domain (synthetic data) and target domain (real-world data), we also propose a Selective Feature Alignment (SFA) module which only aligns the features of consistent foreground area between the two domains, thus realizing inter-domain intra-modality adaptation.

Autonomous Driving Semantic Segmentation +1

Paper
Add Code

DualApp: Tight Over-Approximation for Neural Network Robustness Verification via Under-Approximation

no code implementations • 21 Nov 2022 • Yiting Wu, Zhaodi Zhang, Zhiyi Xue, Si Liu, Min Zhang

We observe that existing approaches only rely on overestimated domains, while the corresponding tight approximation may not necessarily be tight on its actual domain.

Paper
Add Code

Taming Reachability Analysis of DNN-Controlled Systems via Abstraction-Based Training

no code implementations • 21 Nov 2022 • Jiaxu Tian, Dapeng Zhi, Si Liu, Peixin Wang, Guy Katz, Min Zhang

The experimental results on a wide range of benchmarks show that the DNNs trained by using our approach exhibit comparable performance, while the reachability analysis of the corresponding systems becomes more amenable with significant tightness and efficiency improvement over the state-of-the-art white-box approaches.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Video Background Music Generation: Dataset, Method and Evaluation

1 code implementation • ICCV 2023 • Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation.

Music Generation Representation Learning +1

Paper
Code

Teach-DETR: Better Training DETR with Teachers

1 code implementation • 22 Nov 2022 • Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu, Hongsheng Li

In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors.

Paper
Code

Analyzing Infrastructure LiDAR Placement with Realistic LiDAR Simulation Library

2 code implementations • 29 Nov 2022 • Xinyu Cai, Wentao Jiang, Runsheng Xu, Wenquan Zhao, Jiaqi Ma, Si Liu, Yikang Li

Through simulating point cloud data in different LiDAR placements, we can evaluate the perception accuracy of these placements using multiple detection models.

185

Paper
Code

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

1 code implementation • 2 Dec 2022 • Fangxun Shu, Biaolong Chen, Yue Liao, Shuwen Xiao, Wenyu Sun, Xiaobo Li, Yousong Zhu, Jinqiao Wang, Si Liu

Our MAC aims to reduce video representation's spatial and temporal redundancy in the VidLP model by a mask sampling mechanism to improve pre-training efficiency.

Ranked #37 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

Paper
Code

Bridging Search Region Interaction With Template for RGB-T Tracking

1 code implementation • CVPR 2023 • Tianrui Hui, Zizheng Xun, Fengguang Peng, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts.

Ranked #2 on Rgb-T Tracking on RGBT210

Rgb-T Tracking Template Matching

Paper
Code

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

1 code implementation • CVPR 2023 • Chen Gao, Xingyu Peng, Mi Yan, He Wang, Lirong Yang, Haibing Ren, Hongsheng Li, Si Liu

In this paper, we propose an Adaptive Zone-aware Hierarchical Planner (AZHP) to explicitly divides the navigation process into two heterogeneous phases, i. e., sub-goal setting via zone partition/selection (high-level action) and sub-goal executing (low-level action), for hierarchical planning.

Vision-Language Navigation

Paper
Code

Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection

1 code implementation • CVPR 2023 • Shaofei Huang, Zhenwei Shen, Zehao Huang, Zi-han Ding, Jiao Dai, Jizhong Han, Naiyan Wang, Si Liu

An attempt has been made to get rid of BEV and predict 3D lanes from FV representations directly, while it still underperforms other BEV-based methods given its lack of structured representation for 3D lanes.

Ranked #3 on 3D Lane Detection on Apollo Synthetic 3D Lane

3D Lane Detection

126

Paper
Code

Object as Query: Lifting any 2D Object Detector to 3D Detection

1 code implementation • ICCV 2023 • Zitian Wang, Zehao Huang, Jiahui Fu, Naiyan Wang, Si Liu

Existing methods mainly establish 3D representations from multi-view images and adopt a dense detection head for object detection, or employ object queries distributed in 3D space to localize objects.

3D Object Detection Object +1

Paper
Code

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

1 code implementation • 2 Mar 2023 • Rongyao Fang, Peng Gao, Aojun Zhou, Yingjie Cai, Si Liu, Jifeng Dai, Hongsheng Li

The first method is One-to-many Matching via Data Augmentation (denoted as DataAug-DETR).

Data Augmentation object-detection +1

Paper
Code

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

1 code implementation • CVPR 2023 • Luting Wang, Yi Liu, Penghui Du, Zihan Ding, Yue Liao, Qiaosong Qi, Biaolong Chen, Si Liu

When extracting object knowledge from PVLMs, the former adaptively transforms object proposals and adopts object-aware mask attention to obtain precise and complete knowledge of objects.

Ranked #14 on Open Vocabulary Object Detection on MSCOCO

Object Open Vocabulary Object Detection

Paper
Code

Boosting Verified Training for Robust Image Classifications via Abstraction

1 code implementation • CVPR 2023 • Zhaodi Zhang, Zhiyi Xue, Yang Chen, Si Liu, Yueling Zhang, Jing Liu, Min Zhang

Via abstraction, all perturbed images are mapped into intervals before feeding into neural networks for training.

Paper
Code

Sparse Dense Fusion for 3D Object Detection

no code implementations • 9 Apr 2023 • Yulu Gao, Chonghao Sima, Shaoshuai Shi, Shangzhe Di, Si Liu, Hongyang Li

With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection.

3D Object Detection Object +1

Paper
Add Code

DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection

no code implementations • CVPR 2023 • Zongheng Tang, Yifan Sun, Si Liu, Yi Yang

Second, through our design, the object queries and the foreground query in the decoder share consensus on the class semantics, therefore making the strong and weak supervision mutually benefit each other for domain alignment.

object-detection Weakly Supervised Object Detection

Paper
Add Code

Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels

1 code implementation • CVPR 2023 • Jingqiu Zhou, Linjiang Huang, Liang Wang, Si Liu, Hongsheng Li

Besides, the generated pseudo-labels can be fluctuating and inaccurate at the early stage of training.

Pseudo Label Weakly-supervised Temporal Action Localization +1

Paper
Code

A Tale of Two Approximations: Tightening Over-Approximation for DNN Robustness Verification via Under-Approximation

no code implementations • 26 May 2023 • Zhiyi Xue, Si Liu, Zhaodi Zhang, Yiting Wu, Min Zhang

In this paper, we study existing approaches and identify a dominant factor in defining tight approximation, namely the approximation domain of the activation function.

Paper
Add Code

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

1 code implementation • NeurIPS 2023 • Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark.

Image Generation Information Retrieval +1

Paper
Code

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

1 code implementation • 29 Jun 2023 • Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.

Automatic Lyrics Transcription Language Modelling +3

Paper
Code

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

no code implementations • 5 Aug 2023 • Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan

To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation.

Representation Learning Super-Resolution

Paper
Add Code

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

no code implementations • ICCV 2023 • Jinyu Chen, Wenguan Wang, Si Liu, Hongsheng Li, Yi Yang

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Decision Making Transfer Learning +1

Paper
Add Code

Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

no code implementations • 31 Aug 2023 • Si Liu, Chen Gao, Yuan Chen, Xingyu Peng, Xianghao Kong, Kun Wang, Runsheng Xu, Wentao Jiang, Hao Xiang, Jiaqi Ma, Miao Wang

Specifically, we analyze the performance changes of different methods under different bandwidths, providing a deep insight into the performance-bandwidth trade-off issue.

Autonomous Driving

Paper
Add Code

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

2 code implementations • 11 Sep 2023 • Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao

Domain shifts such as sensor type changes and geographical situation variations are prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on the previous domain knowledge can be hardly directly deployed to a new domain without additional costs.

Autonomous Driving Domain Generalization

567

Paper
Code

Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation

no code implementations • 18 Sep 2023 • Shaofei Huang, Han Li, Yuqing Wang, Hongji Zhu, Jiao Dai, Jizhong Han, Wenge Rong, Si Liu

Explicit object-level semantic correspondence between audio and visual modalities is established by gathering object information from visual features with predefined audio queries.

Object Semantic correspondence

Paper
Add Code

Optimizing the Placement of Roadside LiDARs for Autonomous Driving

no code implementations • ICCV 2023 • Wentao Jiang, Hao Xiang, Xinyu Cai, Runsheng Xu, Jiaqi Ma, Yikang Li, Gim Hee Lee, Si Liu

We define perceptual gain as the increased perceptual capability when a new LiDAR is placed.

Autonomous Driving

Paper
Add Code

DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception

1 code implementation • 12 Oct 2023 • Xianghao Kong, Wentao Jiang, Jinrang Jia, Yifeng Shi, Runsheng Xu, Si Liu

To take full advantage of simulated data, we present a new unsupervised sim2real domain adaptation method for V2X collaborative detection named Decoupled Unsupervised Sim2Real Adaptation (DUSA).

Autonomous Driving Domain Adaptation

Paper
Code

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

no code implementations • 2 Nov 2023 • Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption.

Object

Paper
Add Code

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

1 code implementation • 5 Nov 2023 • Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao

While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).

Zero-shot Generalization

264

Paper
Code

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions

no code implementations • 20 Nov 2023 • Songhao Han, Le Zhuo, Yue Liao, Si Liu

We attribute this to two primary factors: 1) the reliance on single-turn textual interactions with LLMs, leading to a mismatch between generated text and visual concepts for VLMs; 2) the oversight of the inter-class relationships, resulting in descriptors that fail to differentiate similar classes effectively.

Attribute Classification +1

Paper
Add Code

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

no code implementations • 4 Dec 2023 • Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu

In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt.

3D scene Editing

Paper
Add Code

Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

no code implementations • 12 Dec 2023 • Yuanbin Wang, Shaofei Huang, Yulu Gao, Zhen Wang, Rui Wang, Kehua Sheng, Bo Zhang, Si Liu

In this work, we focus on zero-shot point cloud semantic segmentation and propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder at both feature and output levels.

3D Semantic Segmentation Point Cloud Segmentation +2

Paper
Add Code

Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator

1 code implementation • 20 Dec 2023 • Donglin Yang, Zhenfeng Liu, Wentao Jiang, Guohang Yan, Xing Gao, Botian Shi, Si Liu, Xinyu Cai

To this end, we propose a simulator-based physical modeling approach to augment LiDAR data in rainy weather in order to improve the perception performance of LiDAR in this scenario.

Data Augmentation object-detection +1

185

Paper
Code

Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

no code implementations • 9 Mar 2024 • Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning.

Lesion Segmentation Segmentation +1

Paper
Add Code

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

1 code implementation • 9 Mar 2024 • Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu

LiDAR-based 3D object detection plays an essential role in autonomous driving.

3D Object Detection Autonomous Driving +2

Paper
Code

Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection

no code implementations • 12 Mar 2024 • Jiahui Fu, Chen Gao, Zitian Wang, Lirong Yang, Xiaofei Wang, Beipeng Mu, Si Liu

Recent 3D object detectors typically utilize multi-sensor data and unify multi-modal features in the shared bird's-eye view (BEV) representation space.

3D Object Detection object-detection

Paper
Add Code

Data Augmentation in Human-Centric Vision

no code implementations • 13 Mar 2024 • Wentao Jiang, Yige Zhang, Shaozhong Zheng, Si Liu, Shuicheng Yan

This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks, a first of its kind in the field.

Data Augmentation Human Parsing +2

Paper
Add Code

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

1 code implementation • 19 Mar 2024 • Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.

Text-to-Image Generation

Paper
Code

V2X-PC: Vehicle-to-everything Collaborative Perception via Point Cluster

no code implementations • 25 Mar 2024 • Si Liu, Zihan Ding, Jiahui Fu, Hongyu Li, Siheng Chen, Shifeng Zhang, Xu Zhou

The point cluster inherently preserves object information while packing messages, with weak relevance to the collaboration range, and supports explicit structure modeling.

Object

Paper
Add Code

Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems

no code implementations • 2 Apr 2024 • Dapeng Zhi, Peixin Wang, Si Liu, Luke Ong, Min Zhang

We also devise a simulation-guided approach for training NBCs, aiming to achieve tightness in computing precise certified lower and upper bounds.

valid

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.