Search Results for author: Guanbin Li

Found 158 papers, 74 papers with code

Propagating Over Phrase Relations for One-Stage Visual Grounding

no code implementations ECCV 2020 Sibei Yang, Guanbin Li, Yizhou Yu

Phrase level visual grounding aims to locate in an image the corresponding visual regions referred to by multiple noun phrases in a given sentence.

Phrase Grounding Relational Reasoning +2

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

1 code implementation CVPR 2024 Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li

In this paper, we propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context through reinforcement learning.

reinforcement-learning Segmentation

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

no code implementations CVPR 2024 Jiaming Li, Jiacheng Zhang, Jichang Li, Ge Li, Si Liu, Liang Lin, Guanbin Li

Specifically, we devise three modules: Background Category-specific Prompt, Background Object Discovery, and Inference Probability Rectification, to empower the detector to discover, represent, and leverage implicit object knowledge explored from background proposals.

Knowledge Distillation Object +3

Fine-grained Spatial-temporal MLP Architecture for Metro Origin-Destination Prediction

no code implementations24 Apr 2024 Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin

Specifically, our ODMixer has double-branch structure and involves the Channel Mixer, the Multi-view Mixer, and the Bidirectional Trend Learner.


UniFL: Improve Stable Diffusion via Unified Feedback Learning

no code implementations8 Apr 2024 Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Min Zheng, Lean Fu, Guanbin Li

Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications.

Image Generation

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

no code implementations CVPR 2024 Ganlong Zhao, Guanbin Li, Weikai Chen, Yizhou Yu

Recent advances in Iterative Vision-and-Language Navigation (IVLN) introduce a more meaningful and practical paradigm of VLN by maintaining the agent's memory across tours of scenes.

Vision and Language Navigation

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

no code implementations CVPR 2024 Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients.

Monocular 3D Object Detection object-detection +1

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

1 code implementation ICCV 2023 Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, YingYing Li, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

To tackle the confirmation bias from incorrect pseudo labels of minority classes, the class-rebalancing sampling module resamples unlabeled data following the guidance of the gradient-based reweighting module.

object-detection Object Detection +1

Annotation-Efficient Polyp Segmentation via Active Learning

no code implementations21 Mar 2024 Duojun Huang, Xinyu Xiong, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li

To minimize annotation costs, we propose a deep active learning framework for annotation-efficient polyp segmentation.

Active Learning Segmentation

Semi- and Weakly-Supervised Learning for Mammogram Mass Segmentation with Limited Annotations

no code implementations14 Mar 2024 Xinyu Xiong, Churan Wang, Wenxue Li, Guanbin Li

Accurate identification of breast masses is crucial in diagnosing breast cancer; however, it can be challenging due to their small size and being camouflaged in surrounding normal glands.

Segmentation Weakly-supervised Learning

Large Multimodal Agents: A Survey

no code implementations23 Feb 2024 Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, Guanbin Li

In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short).

Decision Making

Cell Graph Transformer for Nuclei Classification

1 code implementation20 Feb 2024 Wei Lou, Guanbin Li, Xiang Wan, Haofeng Li

Nuclei classification is a critical step in computer-aided diagnosis with histopathology images.

Classification Nuclei Classification

UniCell: Universal Cell Nucleus Classification via Prompt Learning

1 code implementation20 Feb 2024 Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Li

The recognition of multi-class cell nuclei can significantly facilitate the process of histopathological diagnosis.


MEIA: Towards Realistic Multimodal Interaction and Manipulation for Embodied Robots

1 code implementation1 Feb 2024 Yang Liu, Xinshuai Song, Kaixuan Jiang, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin

To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions.

Embodied Question Answering Language Modelling +3

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

no code implementations26 Jan 2024 Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region.

3D scene Editing

Adaptive Betweenness Clustering for Semi-Supervised Domain Adaptation

no code implementations21 Jan 2024 Jichang Li, Guanbin Li, Yizhou Yu

Once the graph has been refined, Adaptive Betweenness Clustering is introduced to facilitate semantic transfer by using across-domain betweenness clustering and within-domain betweenness clustering, thereby propagating semantic label information from labeled samples across domains to unlabeled target data.

Clustering Semi-supervised Domain Adaptation +1

Inter-Domain Mixup for Semi-Supervised Domain Adaptation

no code implementations21 Jan 2024 Jichang Li, Guanbin Li, Yizhou Yu

However, existing SSDA work fails to make full use of label information from both source and target domains for feature alignment across domains, resulting in label mismatch in the label space during model testing.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

Credible Teacher for Semi-Supervised Object Detection in Open Scene

no code implementations1 Jan 2024 Jingyu Zhuang, Kuo Wang, Liang Lin, Guanbin Li

Credible Teacher adopts an interactive teaching mechanism using flexible labels to prevent uncertain pseudo labels from misleading the model and gradually reduces its uncertainty through the guidance of other credible pseudo labels.

object-detection Object Detection +1

Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal

no code implementations22 Dec 2023 Yicheng Leng, Chaowei Fang, Gen Li, Yixiang Fang, Guanbin Li

Visible watermarks, while instrumental in protecting image copyrights, frequently distort the underlying content, complicating tasks like scene interpretation and image editing.

Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation

no code implementations22 Dec 2023 Chaowei Fang, Ziyin Zhou, Junye Chen, Hanjing Su, Qingyao Wu, Guanbin Li

We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.

Image Segmentation Segmentation +1

FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels

1 code implementation19 Dec 2023 Jichang Li, Guanbin Li, Hui Cheng, Zicheng Liao, Yizhou Yu

However, these prior methods do not learn noise filters by exploiting knowledge across all clients, leading to sub-optimal and inferior noise filtering performance and thus damaging training stability.

Federated Learning Learning with noisy labels +1

GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance

no code implementations12 Dec 2023 Haiming Zhang, Zhihao Yuan, Chaoda Zheng, Xu Yan, Baoyuan Wang, Guanbin Li, Song Wu, Shuguang Cui, Zhen Li

Our proposed GSmoothFace model mainly consists of the Audio to Expression Prediction (A2EP) module and the Target Adaptive Face Translation (TAFT) module.

Face Model Talking Face Generation

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

no code implementations CVPR 2024 Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu

In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt.

3D scene Editing

Multi-stream Cell Segmentation with Low-level Cues for Multi-modality Images

1 code implementation22 Oct 2023 Wei Lou, Xinyi Yu, Chenyu Liu, Xiang Wan, Guanbin Li, SiQi Liu, Haofeng Li

Afterward, we train a separate segmentation model for each category using the images in the corresponding category.

Cell Segmentation Segmentation

Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection

1 code implementation ICCV 2023 Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Li

Multi-class cell nuclei detection is a fundamental prerequisite in the diagnosis of histopathology.

Diffusion-based Data Augmentation for Nuclei Image Segmentation

1 code implementation22 Oct 2023 Xinyi Yu, Guanbin Li, Wei Lou, SiQi Liu, Xiang Wan, Yan Chen, Haofeng Li

Therefore, augmenting a dataset with only a few labeled images to improve the segmentation performance is of significant research and application value.

Data Augmentation Image Generation +3

Semantic-aware Temporal Channel-wise Attention for Cardiac Function Assessment

no code implementations9 Oct 2023 Guanqi Chen, Guanbin Li

Cardiac function assessment aims at predicting left ventricular ejection fraction (LVEF) given an echocardiogram video, which requests models to focus on the changes in the left ventricle during the cardiac cycle.

Auxiliary Learning regression +1

ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models

1 code implementation3 Sep 2023 Yuhao Du, Yuncheng Jiang, Shuangyi Tan, Xusheng Wu, Qi Dou, Zhen Li, Guanbin Li, Xiang Wan

Colonoscopy analysis, particularly automatic polyp segmentation and detection, is essential for assisting clinical diagnosis and treatment.


MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

1 code implementation22 Aug 2023 Tao Chen, Ze Lin, Hui Li, Jiayi Ji, Yiyi Zhou, Guanbin Li, Rongrong Ji

Furthermore, we model product attributes based on both text and image modalities so that multi-modal product characteristics can be manifested in the generated summaries.


WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning

no code implementations20 Aug 2023 Dongjian Huo, Zehong Zhang, Hanjing Su, Guanbin Li, Chaowei Fang, Qingyao Wu

Existing watermark removal methods mainly rely on UNet with task-specific decoder branches--one for watermark localization and the other for background image restoration.

Decoder Image Restoration +1

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

1 code implementation ICCV 2023 Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.

Decoder Image Segmentation +3

Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

1 code implementation CVPR 2023 Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li

Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge.

Image-text matching Text Matching +1

Divide and Adapt: Active Domain Adaptation via Customized Learning

1 code implementation CVPR 2023 Duojun Huang, Jichang Li, Weikai Chen, Junshi Huang, Zhenhua Chai, Guanbin Li

To accommodate active learning and domain adaption, the two naturally different tasks, in a collaborative framework, we advocate that a customized learning strategy for the target data is the key to the success of ADA solutions.

Active Learning Informativeness +3

Improved Distribution Matching for Dataset Condensation

2 code implementations CVPR 2023 Ganlong Zhao, Guanbin Li, Yipeng Qin, Yizhou Yu

In this paper, we propose a novel dataset condensation method based on distribution matching, which is more efficient and promising.

Dataset Condensation Model Optimization

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

1 code implementation ICCV 2023 Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin

Moreover, these methods ignore how to utilize the fine-grained dependencies among different skeleton joints to pre-train an efficient skeleton sequence learning model that can generalize well across different datasets.

Action Recognition Decoder +2

Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

3 code implementations CVPR 2023 Jiacheng Zhang, Xiangru Lin, Wei zhang, Kuo Wang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage.

Object object-detection +3

Universal Semi-supervised Model Adaptation via Collaborative Consistency Training

no code implementations7 Jul 2023 Zizheng Yan, Yushuang Wu, Yipeng Qin, Xiaoguang Han, Shuguang Cui, Guanbin Li

In this paper, we introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA), which i) requires only a pre-trained source model, ii) allows the source and target domain to have different label sets, i. e., they share a common label set and hold their own private label set, and iii) requires only a few labeled samples in each class of the target domain.

Domain Adaptation

CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning

2 code implementations30 Jun 2023 Yang Liu, Weixing Chen, Guanbin Li, Liang Lin

We present CausalVLR (Causal Visual-Linguistic Reasoning), an open-source toolbox containing a rich set of state-of-the-art causal relation discovery and causal inference methods for various visual-linguistic reasoning tasks, such as VQA, image/video captioning, medical report generation, model generalization and robustness, etc.

Causal Inference Medical Report Generation +2

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning

no code implementations30 Jun 2023 Ganlong Zhao, Guanbin Li, Yipeng Qin, Jinjin Zhang, Zhenhua Chai, Xiaolin Wei, Liang Lin, Yizhou Yu

In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

1 code implementation23 Jun 2023 Jingyu Zhuang, Chen Wang, Lingjie Liu, Liang Lin, Guanbin Li

Neural fields have achieved impressive advancements in view synthesis and scene reconstruction.

3D scene Editing

Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

no code implementations CVPR 2023 Ricong Huang, Peiwen Lai, Yipeng Qin, Guanbin Li

In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads.

Data Augmentation Image Inpainting

DenseLight: Efficient Control for Large-scale Traffic Signals with Dense Feedback

1 code implementation13 Jun 2023 Junfan Lin, Yuying Zhu, Lingbo Liu, Yang Liu, Guanbin Li, Liang Lin

1) The travel time of a vehicle is delayed feedback on the effectiveness of TSC policy at each traffic intersection since it is obtained after the vehicle has left the road network.

Reinforcement Learning (RL)

YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection

no code implementations6 Jun 2023 Yuncheng Jiang, Zixun Zhang, Ruimao Zhang, Guanbin Li, Shuguang Cui, Zhen Li

YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations.

Contrastive Learning

Long-term Wind Power Forecasting with Hierarchical Spatial-Temporal Transformer

no code implementations30 May 2023 Yang Zhang, Lingbo Liu, Xinyu Xiong, Guanbin Li, Guoli Wang, Liang Lin

In this work, we propose a novel end-to-end wind power forecasting model named Hierarchical Spatial-Temporal Transformer Network (HSTTN) to address the long-term WPF problems.


Identity-Preserving Talking Face Generation with Landmark and Appearance Priors

1 code implementation CVPR 2023 Weizhi Zhong, Chaowei Fang, Yinqi Cai, Pengxu Wei, Gangming Zhao, Liang Lin, Guanbin Li

Prior landmark characteristics of the speaker's face are employed to make the generated landmarks coincide with the facial outline of the speaker.

Talking Face Generation

Visual Causal Scene Refinement for Video Question Answering

2 code implementations7 May 2023 Yushen Wei, Yang Liu, Hong Yan, Guanbin Li, Liang Lin

Our VCSR involves two essential modules: i) the Question-Guided Refiner (QGR) module, which refines consecutive video frames guided by the question semantics to obtain more representative segment features for causal front-door intervention; ii) the Causal Scene Separator (CSS) module, which discovers a collection of visual causal and non-causal scenes based on the visual-linguistic causal relevance and estimates the causal effect of the scene-separating intervention in a contrastive learning manner.

Contrastive Learning Question Answering +2

Urban Regional Function Guided Traffic Flow Prediction

no code implementations17 Mar 2023 Kuo Wang, Lingbo Liu, Yang Liu, Guanbin Li, Fan Zhou, Liang Lin

The prediction of traffic flow is a challenging yet crucial problem in spatial-temporal analysis, which has recently gained increasing interest.

Cross-Modal Causal Intervention for Medical Report Generation

2 code implementations16 Mar 2023 Weixing Chen, Yang Liu, Ce Wang, Jiarui Zhu, Shen Zhao, Guanbin Li, Cheng-Lin Liu, Liang Lin

Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance, which can relieve the heavy burden of radiologists by automatically generating the corresponding medical reports according to the given radiology image.

Medical Report Generation object-detection +1

Structure Embedded Nucleus Classification for Histopathology Images

no code implementations22 Feb 2023 Wei Lou, Xiang Wan, Guanbin Li, Xiaoying Lou, Chenghang Li, Feng Gao, Haofeng Li

Next, we convert a histopathology image into a graph structure with nuclei as nodes, and build a graph neural network to embed the spatial distribution of nuclei into their representations.

Classification Graph Neural Network +2

Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

1 code implementation ICCV 2023 Zhihong Chen, Shizhe Diao, Benyou Wang, Guanbin Li, Xiang Wan

Medical vision-and-language pre-training (Med-VLP) has shown promising improvements on many downstream medical tasks owing to its applicability to extracting generic representations from medical images and texts.

Image Retrieval Image-text Classification +7

Self-Supervised Correction Learning for Semi-Supervised Biomedical Image Segmentation

1 code implementation12 Jan 2023 Ruifei Zhang, Sishuo Liu, Yizhou Yu, Guanbin Li

Since the two tasks rely on similar feature information, the unlabeled data effectively enhances the representation of the network to the lesion regions and further improves the segmentation performance.

Image Segmentation Medical Image Segmentation +3

Lesion-aware Dynamic Kernel for Polyp Segmentation

1 code implementation12 Jan 2023 Ruifei Zhang, Peiwen Lai, Xiang Wan, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li

Automatic and accurate polyp segmentation plays an essential role in early colorectal cancer diagnosis.

Decoder Segmentation

Adaptive Context Selection for Polyp Segmentation

1 code implementation12 Jan 2023 Ruifei Zhang, Guanbin Li, Zhen Li, Shuguang Cui, Dahong Qian, Yizhou Yu

To tackle these issues, we propose an adaptive context selection based encoder-decoder framework which is composed of Local Context Attention (LCA) module, Global Context Module (GCM) and Adaptive Selection Module (ASM).

Decoder Segmentation

Enhanced Soft Label for Semi-Supervised Semantic Segmentation

no code implementations ICCV 2023 Jie Ma, Chuan Wang, Yang Liu, Liang Lin, Guanbin Li

As a mainstream framework in the field of semi-supervised learning (SSL), self-training via pseudo labeling and its variants have witnessed impressive progress in semi-supervised semantic segmentation with the recent advance of deep neural networks.

Contrastive Learning Pseudo Label +1

RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels

no code implementations ICCV 2023 Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, Guanbin Li

Confidence-wise, we propose a novel sample selection strategy based on confidence representation voting instead of the widely-used small-loss criterion.

Learning with noisy labels Representation Learning +1

Which Pixel to Annotate: a Label-Efficient Nuclei Segmentation Framework

1 code implementation20 Dec 2022 Wei Lou, Haofeng Li, Guanbin Li, Xiaoguang Han, Xiang Wan

Recently deep neural networks, which require a large amount of annotated samples, have been widely applied in nuclei instance segmentation of H\&E stained pathology images.

Instance Segmentation Segmentation +1

BoxPolyp:Boost Generalized Polyp Segmentation Using Extra Coarse Bounding Box Annotations

1 code implementation7 Dec 2022 Jun Wei, Yiwen Hu, Guanbin Li, Shuguang Cui, S Kevin Zhou, Zhen Li

In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model.


Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning

2 code implementations12 Nov 2022 Ziyi Zhang, Weikai Chen, Hui Cheng, Zhen Li, Siyuan Li, Liang Lin, Guanbin Li

We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source-pretrained model is adapted to the target domain without access to the source data.

Contrastive Learning Source-Free Domain Adaptation

Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training

1 code implementation CVPR 2023 Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang Wen Chen

During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion.

Language Modelling Zero-Shot Learning

View-Disentangled Transformer for Brain Lesion Detection

1 code implementation20 Sep 2022 Haofeng Li, Junjia Huang, Guanbin Li, Zhou Liu, Yihong Zhong, Yingying Chen, Yunfei Wang, Xiang Wan

Deep neural networks (DNNs) have been widely adopted in brain lesion detection and segmentation.

Lesion Detection

Attentive Symmetric Autoencoder for Brain MRI Segmentation

1 code implementation19 Sep 2022 Junjia Huang, Haofeng Li, Guanbin Li, Xiang Wan

Self-supervised learning methods based on image patch reconstruction have witnessed great success in training auto-encoders, whose pre-trained weights can be transferred to fine-tune other downstream tasks of image understanding.

Image Segmentation MRI segmentation +3

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

1 code implementation15 Sep 2022 Zhihong Chen, Yuhao Du, Jinpeng Hu, Yang Liu, Guanbin Li, Xiang Wan, Tsung-Hui Chang

Besides, we conduct further analysis to better verify the effectiveness of different components of our approach and various settings of pre-training.

Self-Supervised Learning

Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

1 code implementation15 Sep 2022 Zhihong Chen, Guanbin Li, Xiang Wan

Most existing methods mainly contain three elements: uni-modal encoders (i. e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP.

Neighborhood Collective Estimation for Noisy Label Identification and Correction

1 code implementation5 Aug 2022 Jichang Li, Guanbin Li, Feng Liu, Yizhou Yu

Specifically, our method is divided into two steps: 1) Neighborhood Collective Noise Verification to separate all training samples into a clean or noisy subset, 2) Neighborhood Collective Label Correction to relabel noisy samples, and then auxiliary techniques are used to assist further model optimization.

Learning with noisy labels Model Optimization

Robust Real-World Image Super-Resolution against Adversarial Attacks

1 code implementation31 Jul 2022 Jiutao Yue, Haofeng Li, Pengxu Wei, Guanbin Li, Liang Lin

Since the frequency masking may not only destroys the adversarial perturbations but also affects the sharp details in a clean image, we further develop an adversarial sample classifier based on the frequency domain of images to determine if applying the proposed mask module.

Image Super-Resolution

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

1 code implementation29 Jul 2022 Ganlong Zhao, Guanbin Li, Yipeng Qin, Feng Liu, Yizhou Yu

In this paper, we propose a two-stage clean samples identification method to address the aforementioned challenge.

Ranked #3 on Image Classification on Clothing1M (using extra training data)

Image Classification

Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering

2 code implementations26 Jul 2022 Yang Liu, Guanbin Li, Liang Lin

Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video.

Causal Inference Question Answering +2

Less is More: Adaptive Curriculum Learning for Thyroid Nodule Diagnosis

1 code implementation2 Jul 2022 Haifan Gong, Hui Cheng, Yifan Xie, Shuangyi Tan, Guanqi Chen, Fei Chen, Guanbin Li

Thyroid nodule classification aims at determining whether the nodule is benign or malignant based on a given ultrasound image.


BronchusNet: Region and Structure Prior Embedded Representation Learning for Bronchus Segmentation and Classification

no code implementations14 May 2022 Wenhao Huang, Haifan Gong, huan zhang, Yu Wang, Haofeng Li, Guanbin Li, Hong Shen

CT-based bronchial tree analysis plays an important role in the computer-aided diagnosis for respiratory diseases, as it could provide structured information for clinicians.

Classification Graph Learning +3

Multi-level Consistency Learning for Semi-supervised Domain Adaptation

1 code implementation9 May 2022 Zizheng Yan, Yushuang Wu, Guanbin Li, Yipeng Qin, Xiaoguang Han, Shuguang Cui

Semi-supervised domain adaptation (SSDA) aims to apply knowledge learned from a fully labeled source domain to a scarcely labeled target domain.

Domain Adaptation Semi-supervised Domain Adaptation

Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

1 code implementation CVPR 2022 Xiaoqian Xu, Pengxu Wei, Weikai Chen, Mingzhi Mao, Liang Lin, Guanbin Li

To address this issue, we propose an unsupervised domain adaptation mechanism for real-world SR, named Dual ADversarial Adaptation (DADA), which only requires LR images in the target domain with available real paired data from a source camera.

Image Super-Resolution Unsupervised Domain Adaptation

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

no code implementations26 Apr 2022 Yang Liu, Yushen Wei, Hong Yan, Guanbin Li, Liang Lin

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing.

Benchmarking Out-of-Distribution Generalization +2

Open Set Domain Adaptation By Novel Class Discovery

no code implementations7 Mar 2022 Jingyu Zhuang, Ziliang Chen, Pengxu Wei, Guanbin Li, Liang Lin

In Open Set Domain Adaptation (OSDA), large amounts of target samples are drawn from the implicit categories that never appear in the source domain.

Domain Adaptation Novel Class Discovery

Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

1 code implementation26 Feb 2022 Pengxiang Yan, Ziyi Wu, Mengmeng Liu, Kun Zeng, Liang Lin, Guanbin Li

To relieve the burden of labor-intensive labeling, deep unsupervised SOD methods have been proposed to exploit noisy labels generated by handcrafted saliency methods.

object-detection Object Detection +2

PointMatch: A Consistency Training Framework for Weakly Supervised Semantic Segmentation of 3D Point Clouds

no code implementations22 Feb 2022 Yushuang Wu, Zizheng Yan, Shengcai Cai, Guanbin Li, Yizhou Yu, Xiaoguang Han, Shuguang Cui

Semantic segmentation of point cloud usually relies on dense annotation that is exhausting and costly, so it attracts wide attention to investigate solutions for the weakly supervised scheme with only sparse points annotated.

Representation Learning Weakly supervised Semantic Segmentation +1

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Segmentation

1 code implementation8 Feb 2022 Xinkai Zhao, Chaowei Fang, De-Jun Fan, Xutao Lin, Feng Gao, Guanbin Li

Semi-supervised learning (SSL), which aims at leveraging a few labeled images and a large number of unlabeled images for network training, is beneficial for relieving the burden of data annotation in medical image segmentation.

Contrastive Learning Image Segmentation +5

Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation

no code implementations16 Oct 2021 Yang Wu, Shirui Feng, Guanbin Li, Liang Lin

PEMR includes a "looking ahead" process, \textit{i. e.} a visual feature extractor module that estimates feasible paths for gathering 3D navigational information, which is mimicking the human sense of direction.

Common Sense Reasoning Embodied Question Answering +1

Road Network Guided Fine-Grained Urban Traffic Flow Inference

1 code implementation29 Sep 2021 Lingbo Liu, Mengmeng Liu, Guanbin Li, Ziyi Wu, Junfan Lin, Liang Lin

Furthermore, we take the road network feature as a query to capture the long-range spatial distribution of traffic flow with a transformer architecture.

Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning

no code implementations ICCV 2021 Junkai Huang, Chaowei Fang, Weikai Chen, Zhenhua Chai, Xiaolin Wei, Pengxu Wei, Liang Lin, Guanbin Li

Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.

Binary Classification

Towards Interpretable Deep Networks for Monocular Depth Estimation

1 code implementation ICCV 2021 Zunzhi You, Yi-Hsuan Tsai, Wei-Chen Chiu, Guanbin Li

Based on our observations, we quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.

Monocular Depth Estimation

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations9 Aug 2021 Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Colorectal Polyp Classification from White-light Colonoscopy Images via Domain Alignment

no code implementations5 Aug 2021 Qin Wang, Hui Che, Weizhen Ding, Li Xiang, Guanbin Li, Zhen Li, Shuguang Cui

Thus, we propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification (CPC) through directly using white-light (WL) colonoscopy images in the examination.

Contrastive Learning

Online Metro Origin-Destination Prediction via Heterogeneous Information Aggregation

1 code implementation2 Jul 2021 Lingbo Liu, Yuying Zhu, Guanbin Li, Ziyi Wu, Lei Bai, Liang Lin

In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e. g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership.

Time Series Analysis

Bottom-Up Shift and Reasoning for Referring Image Segmentation

1 code implementation CVPR 2021 Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, Yizhou Yu

In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules.

Image Segmentation Segmentation +2

Cross-Modal Progressive Comprehension for Referring Segmentation

1 code implementation15 May 2021 Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li

In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models.

Attribute Image Segmentation +5

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

no code implementations CVPR 2021 Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.

Decoder feature selection +1

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

2 code implementations CVPR 2021 Jichang Li, Guanbin Li, Yemin Shi, Yizhou Yu

Pseudo labeling expands the number of ``labeled" samples in each class in the target domain, and thus produces a more robust and powerful cluster core for each class to facilitate adversarial learning.

Clustering Domain Adaptation +1

Deep Transformers for Fast Small Intestine Grounding in Capsule Endoscope Video

no code implementations7 Apr 2021 Xinkai Zhao, Chaowei Fang, Feng Gao, De-Jun Fan, Xutao Lin, Guanbin Li

In this paper, we propose a deep model to ground shooting range of small intestine from a capsule endoscope video which has duration of tens of hours.

Scene-Intuitive Agent for Remote Embodied Visual Grounding

no code implementations CVPR 2021 Xiangru Lin, Guanbin Li, Yizhou Yu

Intuitively, we comprehend the semantics of the instruction to form an overview of where a bathroom is and what a blue towel is in mind; then, we navigate to the target location by consistently matching the bathroom appearance in mind with the current scene.

Navigate Referring Expression +1

Adversarial Training using Contrastive Divergence

no code implementations1 Jan 2021 Hongjun Wang, Guanbin Li, Liang Lin

To protect the security of machine learning models against adversarial examples, adversarial training becomes the most popular and powerful strategy against various adversarial attacks by injecting adversarial examples into training data.

A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning

no code implementations15 Oct 2020 Hongjun Wang, Guanbin Li, Xiaobai Liu, Liang Lin

Although deep convolutional neural networks (CNNs) have demonstrated remarkable performance on multiple computer vision tasks, researches on adversarial learning have shown that deep models are vulnerable to adversarial examples, which are crafted by adding visually imperceptible perturbations to the input images.

Adversarial Attack

Contralaterally Enhanced Networks for Thoracic Disease Detection

no code implementations9 Oct 2020 Gangming Zhao, Chaowei Fang, Guanbin Li, Licheng Jiao, Yizhou Yu

Aimed at improving the performance of existing detection methods, we propose a deep end-to-end module to exploit the contralateral context information for enhancing feature representations of disease proposals.

Referring Image Segmentation via Cross-Modal Progressive Comprehension

1 code implementation CVPR 2020 Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.

Attribute Image Segmentation +2

Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos

no code implementations18 Sep 2020 Jie Wu, Guanbin Li, Xiaoguang Han, Liang Lin

Temporal grounding of natural language in untrimmed videos is a fundamental yet challenging multimedia task facilitating cross-media visual content retrieval.

reinforcement-learning Reinforcement Learning (RL) +2

Online Alternate Generator against Adversarial Attacks

no code implementations17 Sep 2020 Haofeng Li, Yirui Zeng, Guanbin Li, Liang Lin, Yizhou Yu

The field of computer vision has witnessed phenomenal progress in recent years partially due to the development of deep convolutional neural networks.

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition

1 code implementation1 Sep 2020 Yang Liu, Keze Wang, Guanbin Li, Liang Lin

In this paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors.

Action Recognition Image Generation +3

Graph-Structured Referring Expression Reasoning in The Wild

1 code implementation CVPR 2020 Sibei Yang, Guanbin Li, Yizhou Yu

The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression.

Referring Expression

Efficient Crowd Counting via Structured Knowledge Transfer

2 code implementations23 Mar 2020 Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang Lin

Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.

Crowd Counting Transfer Learning

Peeking into occluded joints: A novel framework for crowd pose estimation

1 code implementation ECCV 2020 Lingteng Qiu, Xuanye Zhang, Yan-ran Li, Guanbin Li, Xiao-Jun Wu, Zixiang Xiong, Xiaoguang Han, Shuguang Cui

Although occlusion widely exists in nature and remains a fundamental challenge for pose estimation, existing heatmap-based approaches suffer serious degradation on occlusions.

Pose Estimation

Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread

no code implementations22 Jan 2020 Haofeng Li, Guanbin Li, BinBin Yang, Guanqi Chen, Liang Lin, Yizhou Yu

The proposed algorithm for the first time achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.

Image Classification Object +4

Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction

2 code implementations14 Jan 2020 Lingbo Liu, Jingwen Chen, Hefeng Wu, Jiajie Zhen, Guanbin Li, Liang Lin

To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs.

Representation Learning

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

no code implementations18 Dec 2019 Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin

In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations.

Position Segmentation +2

Self-Enhanced Convolutional Network for Facial Video Hallucination

no code implementations23 Nov 2019 Chaowei Fang, Guanbin Li, Xiaoguang Han, Yizhou Yu

It further recurrently exploits the reconstructed results and intermediate features of a sequence of preceding frames to improve the initial super-resolution of the current frame by modelling the coherence of structural facial features across frames.

Hallucination Video Super-Resolution

Globally Guided Progressive Fusion Network for 3D Pancreas Segmentation

no code implementations23 Nov 2019 Chaowei Fang, Guanbin Li, Chengwei Pan, Yiming Li, Yizhou Yu

Recently 3D volumetric organ segmentation attracts much research interest in medical image analysis due to its significance in computer aided diagnosis.

Organ Segmentation Pancreas Segmentation +1

Knowledge Graph Transfer Network for Few-Shot Recognition

1 code implementation21 Nov 2019 Riquan Chen, Tianshui Chen, Xiaolu Hui, Hefeng Wu, Guanbin Li, Liang Lin

In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN).

Few-Shot Image Classification Few-Shot Learning +2

Learning to Recognize the Unseen Visual Predicates

no code implementations25 Sep 2019 Defa Zhu, Si Liu, Wentao Jiang, Guanbin Li, Tianyi Wu, Guodong Guo

Visual relationship recognition models are limited in the ability to generalize from finite seen predicates to unseen ones.

Question Answering Visual Question Answering +1

Dynamic Graph Attention for Referring Expression Comprehension

no code implementations ICCV 2019 Sibei Yang, Guanbin Li, Yizhou Yu

In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression.

Graph Attention Referring Expression +2

Motion Guided Attention for Video Salient Object Detection

2 code implementations ICCV 2019 Haofeng Li, Guanqi Chen, Guanbin Li, Yizhou Yu

In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images.

Object object-detection +4

Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction

2 code implementations2 Sep 2019 Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, Liang Lin

Specifically, the first ConvLSTM unit takes normal traffic flow features as input and generates a hidden state at each time-step, which is further fed into the connected convolutional layer for spatial attention map inference.

Representation Learning Traffic Prediction

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

no code implementations ICCV 2019 Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang

To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.

Image Retrieval Retrieval

Crowd Counting with Deep Structured Scale Integration Network

no code implementations ICCV 2019 Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, Liang Lin

Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people.

Crowd Counting Representation Learning

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

1 code implementation ICCV 2019 Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, Liang Lin

Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module.

 Ranked #1 on Video Salient Object Detection on VOS-T (using extra training data)

object-detection Salient Object Detection +2

Semi-supervised Skin Detection by Network with Mutual Guidance

no code implementations ICCV 2019 Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang

In this paper we present a new data-driven method for robust skin detection from a single human portrait image.


Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching

1 code implementation8 Jul 2019 Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin

A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs).

Relationship-Embedded Representation Learning for Grounding Referring Expressions

1 code implementation CVPR 2019 Sibei Yang, Guanbin Li, Yizhou Yu

Unfortunately, existing work on grounding referring expressions fails to accurately extract multi-order relationships from the referring expression and associate them with the objects and their related contexts in the image.

Referring Expression Representation Learning

Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction

no code implementations15 May 2019 Lingbo Liu, Zhilin Qiu, Guanbin Li, Qing Wang, Wanli Ouyang, Liang Lin

Finally, a GCC module is applied to model the correlation between all regions by computing a global correlation feature as a weighted sum of all regional features, with the weights being calculated as the similarity between the corresponding region pairs.

ROSA: Robust Salient Object Detection against Adversarial Attacks

no code implementations9 May 2019 Haofeng Li, Guanbin Li, Yizhou Yu

To our knowledge, this paper is the first one that mounts successful adversarial attacks on salient object detection models and verifies that adversarial samples are effective on a wide range of existing methods.

Object object-detection +2

Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning

no code implementations4 May 2019 Yukai Shi, Guanbin Li, Qingxing Cao, Keze Wang, Liang Lin

Face hallucination is a domain-specific super-resolution problem that aims to generate a high-resolution (HR) face image from a low-resolution~(LR) input.

Face Hallucination Hallucination +3

Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition

no code implementations22 Apr 2019 Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, Liang Lin

Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation.

Facial Action Unit Detection Graph Neural Network +1

Harvesting Visual Objects from Internet Images via Deep Learning Based Objectness Assessment

no code implementations1 Apr 2019 Kan Wu, Guanbin Li, Haofeng Li, Jianjun Zhang, Yizhou Yu

As a concrete example, a database of over 1. 2 million visual objects has been built using the proposed method, and has been successfully used in various data-driven image applications.

Image Generation Object +1

Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions

no code implementations11 Dec 2018 Weikai Chen, Xiaoguang Han, Guanbin Li, Chao Chen, Jun Xing, Yajie Zhao, Hao Li

Three-dimensional object recognition has recently achieved great progress thanks to the development of effective point cloud-based learning frameworks, such as PointNet and its extensions.

3D Object Recognition

Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning

no code implementations10 Dec 2018 Lingbo Liu, Guanbin Li, Yuan Xie, Yizhou Yu, Qing Wang, Liang Lin

In this paper, we propose a novel cascaded backbone-branches fully convolutional neural network~(BB-FCN) for rapidly and accurately localizing facial landmarks in unconstrained and cluttered settings.

Face Alignment Face Detection +2

FRAME Revisited: An Interpretation View Based on Particle Evolution

no code implementations4 Dec 2018 Xu Cai, Yang Wu, Guanbin Li, Ziliang Chen, Liang Lin

FRAME (Filters, Random fields, And Maximum Entropy) is an energy-based descriptive model that synthesizes visual realism by capturing mutual patterns from structural input signals.


Cross-Modal Attentional Context Learning for RGB-D Object Detection

no code implementations30 Oct 2018 Guanbin Li, Yukang Gan, Hejun Wu, Nong Xiao, Liang Lin

In this paper, we address this problem by developing a Cross-Modal Attentional Context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data.

Autonomous Driving Object +2

Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

no code implementations10 Oct 2018 Lili Huang, Jiefeng Peng, Ruimao Zhang, Guanbin Li, Liang Lin

Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision.

Representation Learning Segmentation +1

Attentive Crowd Flow Machines

no code implementations1 Sep 2018 Lingbo Liu, Ruimao Zhang, Jiefeng Peng, Guanbin Li, Bowen Du, Liang Lin

Traffic flow prediction is crucial for urban traffic management and public safety.


Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

no code implementations4 Aug 2018 Guanbin Li, Xiang He, Wei zhang, Huiyou Chang, Le Dong, Liang Lin

Single image rain streaks removal has recently witnessed substantial progress due to the development of deep convolutional neural networks.


Crowd Counting using Deep Recurrent Spatial-Aware Network

no code implementations2 Jul 2018 Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin

Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera's perspective that causes huge appearance variations in people's scales and rotations.

Crowd Counting Management

Interpretable Video Captioning via Trajectory Structured Localization

no code implementations CVPR 2018 Xian Wu, Guanbin Li, Qingxing Cao, Qingge Ji, Liang Lin

Automatically describing open-domain videos with natural language are attracting increasing interest in the field of artificial intelligence.

Decoder Image Captioning +3

Visual Question Reasoning on General Dependency Tree

no code implementations CVPR 2018 Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin

This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence.

Question Answering Visual Question Answering

Contrast-Oriented Deep Neural Networks for Salient Object Detection

no code implementations30 Mar 2018 Guanbin Li, Yizhou Yu

In this paper, we develop hybrid contrast-oriented deep neural networks to overcome the aforementioned limitations.

Object object-detection +2

Weakly Supervised Salient Object Detection Using Image Labels

no code implementations17 Mar 2018 Guanbin Li, Yuan Xie, Liang Lin

Our algorithm is based on alternately exploiting a graphical model and training a fully convolutional network for model updating.

Object object-detection +3

Context-Aware Semantic Inpainting

no code implementations21 Dec 2017 Haofeng Li, Guanbin Li, Liang Lin, Yizhou Yu

Our proposed GAN-based framework consists of a fully convolutional design for the generator which helps to better preserve spatial structures and a joint loss function with a revised perceptual loss to capture high-level semantics in the context.

Generative Adversarial Network Image Inpainting

Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition

no code implementations20 Dec 2017 Tianshui Chen, Zhouxia Wang, Guanbin Li, Liang Lin

Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks.

reinforcement-learning Reinforcement Learning (RL)

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

no code implementations ICCV 2017 Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin

This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding.

General Classification Multi-Label Image Classification +1

Attention-Aware Face Hallucination via Deep Reinforcement Learning

no code implementations CVPR 2017 Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li

Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images.

Face Hallucination Hallucination +3

Instance-Level Salient Object Segmentation

no code implementations CVPR 2017 Guanbin Li, Yuan Xie, Liang Lin, Yizhou Yu

Image saliency detection has recently witnessed rapid progress due to deep convolutional neural networks.

Ranked #17 on RGB Salient Object Detection on DUTS-TE (max F-measure metric)

Instance Segmentation Object +3

Visual Saliency Detection Based on Multiscale Deep CNN Features

2 code implementations7 Sep 2016 Guanbin Li, Yizhou Yu

The penultimate layer of our neural network has been confirmed to be a discriminative high-level feature vector for saliency detection, which we call deep contrast feature.

Saliency Detection

Deep Contrast Learning for Salient Object Detection

no code implementations CVPR 2016 Guanbin Li, Yizhou Yu

Our deep network consists of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream.

Ranked #21 on RGB Salient Object Detection on DUTS-TE (max F-measure metric)

Object object-detection +2

Visual Saliency Based on Multiscale Deep Features

no code implementations CVPR 2015 Guanbin Li, Yizhou Yu

Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision.

Image Segmentation Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.