Search Results for author: Liang Lin

Found 339 papers, 140 papers with code

SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks

no code implementations17 Jun 2025 Zijian Song, Xiaoxin Lin, Qiuming Huang, Guangrun Wang, Liang Lin

Large Language Models (LLMs) are experiencing rapid advancements in complex reasoning, exhibiting remarkable generalization in mathematics and programming.

Math Spatial Reasoning

DART: Differentiable Dynamic Adaptive Region Tokenizer for Vision Transformer and Mamba

1 code implementation12 Jun 2025 Shicheng Yin, Kaixuan Yin, Yang Liu, Weixing Chen, Liang Lin

Recently, non-convolutional models such as the Vision Transformer (ViT) and Vision Mamba (Vim) have achieved remarkable performance in computer vision tasks.

Mamba

UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models

1 code implementation21 May 2025 Miao Yu, Liang Lin, Guibin Zhang, Xinfeng Li, Junfeng Fang, Ningyu Zhang, Kun Wang, Yang Wang

Large language models require iterative updates to address challenges such as knowledge conflicts and outdated information (e. g., incorrect, private, or illegal contents).

Machine Unlearning Model Editing +1

DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once

1 code implementation7 May 2025 Qi Zhou, Yukai Shi, Xiaojun Yang, Xiaoyu Xian, Lunjia Liao, Ruimao Zhang, Liang Lin

Visible and infrared image fusion is one of the most crucial tasks in the field of image fusion, aiming to generate fused images with clear structural information and high-quality texture features for high-level vision tasks.

All Autonomous Driving +1

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

no code implementations22 Apr 2025 Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Kai Wang, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Xiaolong Jin, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Liang Pang, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, XiaoFeng Wang, DaCheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu

Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e. g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs.

Model Editing

3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians

1 code implementation15 Apr 2025 Zeming Wei, Junyi Lin, Yang Liu, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin

Building upon this dataset, we introduce AffordSplatNet, a novel model specifically designed for affordance reasoning using 3DGS representations.

3DGS Affordance Recognition

DreamFuse: Adaptive Image Fusion with Diffusion Transformer

no code implementations11 Apr 2025 Junjia Huang, Pengxiang Yan, Jiyang Liu, Jie Wu, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li

Image fusion seeks to seamlessly integrate foreground objects with background scenes, producing realistic and harmonious fused images.

Attribute Style Transfer

Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation

no code implementations8 Apr 2025 Tianshui Chen, Jianman Lin, Zhijing Yang, Chumei Qing, Yukai Shi, Liang Lin

In this work, we propose to learn content and emotion priors as guidance augmented with contrastive learning to learn decoupled content and emotion representation via an innovative Contrastive Decoupled Representation Learning (CDRL) algorithm.

Contrastive Learning Representation Learning

VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction

no code implementations CVPR 2025 Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, Guanbin Li

Virtual Try-On (VTON) is a transformative technology in e-commerce and fashion design, enabling realistic digital visualization of clothing on individuals.

Virtual Try-on

Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

no code implementations14 Mar 2025 Kaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin

Embodied Question Answering (EQA) is a challenging task in embodied intelligence that requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions.

Embodied Question Answering Question Answering

RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs

1 code implementation8 Mar 2025 Zhongzhan Huang, Guoming Ling, Yupei Lin, Yandong Chen, Shanshan Zhong, Hefeng Wu, Liang Lin

This improvement can even surpass the performance of the best single model in the pool and many existing strong LLMs, confirming it a highly promising paradigm.

Instruction Following Mathematical Reasoning

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

1 code implementation CVPR 2025 Jingzhou Luo, Yang Liu, Weixing Chen, Zhen Li, YaoWei Wang, Guanbin Li, Liang Lin

In this paper, we propose a Dual-vision Scene Perception Network (DSPNet), to comprehensively integrate multi-view and point cloud features to improve robustness in 3D QA.

3D Question Answering (3D-QA) Question Answering

Cross-modal Causal Relation Alignment for Video Question Grounding

1 code implementation CVPR 2025 Weixing Chen, Yang Liu, Binglin Chen, Jiandong Su, Yongsen Zheng, Liang Lin

Video question grounding (VideoQG) requires models to answer the questions and simultaneously infer the relevant video segments to support the answers.

Contrastive Learning cross-modal alignment +2

AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay

1 code implementation24 Feb 2025 Ziyi Tang, Zechuan Chen, Jiarui Yang, Jiayao Mai, Yongsen Zheng, Keze Wang, Jinrui Chen, Liang Lin

Alpha mining, a critical component in quantitative investment, focuses on discovering predictive signals for future asset returns in increasingly complex financial markets.

CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond

no code implementations20 Feb 2025 Yukai Shi, Cidan Shi, Zhipeng Weng, Yin Tian, Xiaoyu Xian, Liang Lin

Unlike existing research, our focus is on the challenges posed by OOD data in real-world applications and on enhancing the robustness and generalization of models.

Autonomous Driving Data Augmentation +2

Randomness of Low-Layer Parameters Determines Confusing Samples in Terms of Interaction Representations of a DNN

no code implementations12 Feb 2025 Junpeng Zhang, Lei Cheng, Qing Li, Liang Lin, Quanshi Zhang

In this paper, we find that the complexity of interactions encoded by a deep neural network (DNN) can explain its generalization power.

Pre-Trained Video Generative Models as World Simulators

no code implementations10 Feb 2025 Haoran He, Yang Zhang, Liang Lin, Zhongwen Xu, Ling Pan

Video generative models pre-trained on large-scale internet datasets have achieved remarkable success, excelling at producing realistic synthetic videos.

Model-based Reinforcement Learning

Decoder-Only LLMs are Better Controllers for Diffusion Models

no code implementations6 Feb 2025 Ziyi Dong, Yao Xiao, Pengxu Wei, Liang Lin

Groundbreaking advancements in text-to-image generation have recently been achieved with the emergence of diffusion models.

Decoder Text to Image Generation +1

A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models

no code implementations25 Jan 2025 Zhongzhan Huang, Shanshan Zhong, Pan Zhou, ShangHua Gao, Marinka Zitnik, Liang Lin

This game aligns well with the input-output structure of modern multimodal LLMs and benefits from a rich repository of high-quality, human-annotated creative responses, making it an ideal platform for studying LLM creativity.

Logical Reasoning

SR-FoT: A Syllogistic-Reasoning Framework of Thought for Large Language Models Tackling Knowledge-based Reasoning Tasks

1 code implementation20 Jan 2025 Wentao Wan, Zhuojie Yang, Yongcan Chen, Chenglin Luo, Ruilin Wang, Kehao Cai, Nan Kang, Liang Lin, Keze Wang

Finally, it guides LLMs to use the previously generated major and minor premises to perform syllogistic deductive reasoning to derive the answer to the original question.

Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation

1 code implementation CVPR 2025 Rong Qin, Xingyu Liu, Jinglei Shi, Liang Lin, Jufeng Yang

Over the last decade, significant efforts have been dedicated to designing efficient models for the challenge of ultra-high resolution (UHR) semantic segmentation.

Semantic Segmentation

PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention

1 code implementation CVPR 2025 Weicheng Wang, Guoli Jia, Zhongqi Zhang, Liang Lin, Jufeng Yang

The effect of the former is estimated through intrinsic image decomposition, and the region of the latter is predicted in an additional background effect control branch.

Intrinsic Image Decomposition

No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition

1 code implementation CVPR 2025 Rong Qin, Xin Liu, Xingyu Liu, Jiaxuan Liu, Jinglei Shi, Liang Lin, Jufeng Yang

Over the last decade, many notable methods have emerged to tackle the computational resource challenge of the high resolution image recognition (HRIR).

Multiple Instance Learning

Reproducible Vision-Language Models Meet Concepts Out of Pre-Training

no code implementations CVPR 2025 Ziliang Chen, Xin Huang, Xiaoxuan Fan, Keze Wang, Yuyu Zhou, Quanlong Guan, Liang Lin

We propose LAION-Beyond benchmark to isolate the evaluation of OOP concepts from pre-training knowledge, with regards to OpenCLIP and its reproducible variants derived from LAION datasets.

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

no code implementations CVPR 2025 Xinshuai Song, Weixing Chen, Yang Liu, Vincent Chan, Guanbin Li, Liang Lin

Existing Vision-Language Navigation (VLN) methods primarily focus on single-stage navigation, limiting their effectiveness in multi-stage and long-horizon tasks within complex and dynamic environments.

Vision-Language Navigation

Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition

no code implementations9 Dec 2024 Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin

The proposed framework consists of two complementary modules, i. e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.

Patch Matching

Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review

no code implementations15 Nov 2024 Hossein Hassani, Roozbeh Razavi-Far, Mehrdad Saif, Liang Lin

Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies for the efficient transfer of knowledge from source domains to the target domain under the transfer learning scheme.

Reinforcement Learning (RL) Sequential Decision Making +1

Integration of Communication and Computational Imaging

no code implementations25 Oct 2024 Zhenming Yu, Liming Cheng, Hongyu Huang, Wei zhang, Liang Lin, Kun Xu

Herein, we propose a novel framework that integrates communication and computational imaging (ICCI) to break through the inherent isolation between communication and computational imaging for remote perception.

Data Compression

Style-Preserving Lip Sync via Audio-Aware Style Reference

no code implementations10 Aug 2024 Weizhi Zhong, Jichang Li, Yinqi Cai, Ming Li, Feng Gao, Liang Lin, Guanbin Li

Specifically, we first develop an advanced Transformer-based model adept at predicting lip motion corresponding to the input audio, augmented by the style information aggregated through cross-attention layers from style reference video.

Improving Network Interpretability via Explanation Consistency Evaluation

no code implementations8 Aug 2024 Hefeng Wu, Hao Jiang, Keze Wang, Ziyi Tang, Xianghuan He, Liang Lin

The pursuit of greater interpretability in neural networks often results in a degradation of their original performance.

Adversarial Attack

MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection

1 code implementation31 Jul 2024 Kuo Wang, Lechao Cheng, Weikai Chen, Pingping Zhang, Liang Lin, Fan Zhou, Guanbin Li

Learning from pseudo-labels that generated with VLMs~(Vision Language Models) has been shown as a promising solution to assist open vocabulary detection (OVD) in recent studies.

Language Modelling Object +4

Cool-Fusion: Fuse Large Language Models without Training

no code implementations29 Jul 2024 Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, Xu Chen

We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths.

Combinatorial Optimization GSM8K +1

Scaling Up Single Image Dehazing Algorithm by Cross-Data Vision Alignment for Richer Representation Learning and Beyond

1 code implementation20 Jul 2024 Yukai Shi, Zhipeng Weng, Yupei Lin, Cidan Shi, Xiaojun Yang, Liang Lin

Ignoring the domain gap between different data, former de-hazing methods simply adopt multiple datasets for explicit large-scale training, which often makes the methods themselves be violated.

Data Augmentation Image Dehazing +2

WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

1 code implementation15 Jul 2024 Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H. S. Torr, Liang Lin

Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.

Virtual Try-on

Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram

no code implementations10 Jul 2024 Ming-Liang Zhang, Zhong-Zhi Li, Fei Yin, Liang Lin, Cheng-Lin Liu

In modal fusion, we leverage textual clauses to express fine-grained structural and semantic content of geometry diagram, and fuse diagram with textual problem efficiently through structural-semantic pre-training.

Decoder Geometry Problem Solving +1

Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

1 code implementation9 Jul 2024 Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization.

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

1 code implementation9 Jul 2024 Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guanbin Li, Wen Gao, Liang Lin

In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI.

Survey

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

1 code implementation5 Jun 2024 Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin

(b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations.

Binary Classification Graph Representation Learning +3

Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior

1 code implementation2 Jun 2024 Yukai Shi, Yupei Lin, Pengxu Wei, Xiaoyu Xian, Tianshui Chen, Liang Lin

Large-scale trained diffusion models have a strong generative prior that enables real-world modeling of images to generate diverse and realistic images.

Data Augmentation Diversity +1

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

no code implementations CVPR 2024 Jiaming Li, Jiacheng Zhang, Jichang Li, Ge Li, Si Liu, Liang Lin, Guanbin Li

Specifically, we devise three modules: Background Category-specific Prompt, Background Object Discovery, and Inference Probability Rectification, to empower the detector to discover, represent, and leverage implicit object knowledge explored from background proposals.

Knowledge Distillation Object +4

Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs

no code implementations20 May 2024 Siyu Lou, Yuntian Chen, Xiaodan Liang, Liang Lin, Quanshi Zhang

In this study, we propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM) for language generation.

Disentanglement Language Modeling +4

ODMixer: Fine-grained Spatial-temporal MLP for Metro Origin-Destination Prediction

1 code implementation24 Apr 2024 Yang Liu, Binglin Chen, Yongsen Zheng, Lechao Cheng, Guanbin Li, Liang Lin

Metro Origin-Destination (OD) prediction is a crucial yet challenging spatial-temporal prediction task in urban computing, which aims to accurately forecast cross-station ridership for optimizing metro scheduling and enhancing overall transport efficiency.

Prediction Scheduling

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

no code implementations22 Apr 2024 Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, Liang Lin

To further enhance the robustness of DBP models, we introduce Adversarial Denoising Diffusion Training (ADDT), which incorporates classifier-guided adversarial perturbations into diffusion training, thereby strengthening the DBP models' ability to purify adversarial perturbations.

Adversarial Purification Denoising

AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment

no code implementations7 Apr 2024 Yuanfeng Xu, Yuhao Chen, Zhongzhan Huang, Zijian He, Guangrun Wang, Philip Torr, Liang Lin

In this paper, we present AnimateZoo, a zero-shot diffusion-based video generator to address this challenging cross-species animation issue, aiming to accurately produce animal animations while preserving the background.

Video Editing Video Generation

IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images

1 code implementation18 Mar 2024 Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin

IDF-CR consists of a pixel space cloud removal module (Pixel-CR) and a latent space iterative noise diffusion network (IND).

Cloud Removal Image Generation +1

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

no code implementations9 Mar 2024 Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts.

Contrastive Learning Navigate +1

DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions

1 code implementation2 Mar 2024 Guangrun Wang, Changlin Li, Liuchun Yuan, Jiefeng Peng, Xiaoyu Xian, Xiaodan Liang, Xiaojun Chang, Liang Lin

Addressing this problem, we modularize a large search space into blocks with small search spaces and develop a family of models with the distilling neural architecture (DNA) techniques.

Neural Architecture Search

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

1 code implementation CVPR 2024 Tao Tang, Guangrun Wang, Yixing Lao, Peng Chen, Jie Liu, Liang Lin, Kaicheng Yu, Xiaodan Liang

Through extensive experiments across various datasets and scenes, we demonstrate the effectiveness of our approach in facilitating better interaction between LiDAR and camera modalities within a unified neural field.

Novel View Synthesis

Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima

1 code implementation17 Feb 2024 Shanshan Zhong, Zhongzhan Huang, Daifeng Li, Wushao Wen, Jinghui Qin, Liang Lin

This strategy can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs.

Multimodal Recommendation

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments

1 code implementation1 Feb 2024 Yang Liu, Xinshuai Song, Kaixuan Jiang, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin

To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions.

Embodied Question Answering Language Modeling +4

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

no code implementations26 Jan 2024 Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region.

3D scene Editing

Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition

1 code implementation20 Jan 2024 Yuefang Gao, Yuhao Xie, Zeke Zexi Hu, Tianshui Chen, Liang Lin

Specifically, the framework consists of separate global-local adversarial learning modules that learn domain-invariant global and local features independently.

Cross-Domain Facial Expression Recognition Model Optimization +2

Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition

no code implementations13 Jan 2024 Hefeng Wu, Guangzhi Ye, Ziyang Zhou, Ling Tian, Qing Wang, Liang Lin

Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes.

Hallucination Novel Concepts +1

MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond

no code implementations6 Jan 2024 Yupei Lin, Xiaoyu Xian, Yukai Shi, Liang Lin

By using a target text prompt for domain adaption, the diffusion model is able to implement zero-shot image-to-image translation advantageously.

Denoising Domain Adaptation +3

Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation

1 code implementation CVPR 2024 Tianshui Chen, Jianman Lin, Zhijing Yang, Chunmei Qing, Liang Lin

To capitalize on this insight we propose a novel adaptive spatial coherent correlation learning (ASCCL) algorithm which models the aforementioned correlation as an explicit metric and integrates the metric to supervise manipulating facial expression and meanwhile better preserving the facial animation of spoken contents.

Credible Teacher for Semi-Supervised Object Detection in Open Scene

no code implementations1 Jan 2024 Jingyu Zhuang, Kuo Wang, Liang Lin, Guanbin Li

Credible Teacher adopts an interactive teaching mechanism using flexible labels to prevent uncertain pseudo labels from misleading the model and gradually reduces its uncertainty through the guidance of other credible pseudo labels.

object-detection Object Detection +1

Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach

no code implementations15 Dec 2023 Ziliang Chen, Yongsen Zheng, Zhao-Rong Lai, Quanlong Guan, Liang Lin

Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization.

feature selection Representation Learning

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

1 code implementation CVPR 2024 Shanshan Zhong, Zhongzhan Huang, ShangHua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou

To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus is suitable for LoT study.

Logical Reasoning

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering

no code implementations29 Nov 2023 Zeqing Wang, Wentao Wan, Qiqing Lao, Runmeng Chen, Minjie Lang, Keze Wang, Liang Lin

Attempt to overcome this limitation and inspired by the human top-down reasoning process, i. e., systematically exploring relevant issues to derive a comprehensive answer, this work introduces a novel, explainable multi-agent collaboration framework by leveraging the expansive knowledge of Large Language Models (LLMs) to enhance the capabilities of VLMs themselves.

Common Sense Reasoning Question Answering +2

SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting

1 code implementation16 Nov 2023 Hefeng Wu, Yandong Chen, Lingbo Liu, Tianshui Chen, Keze Wang, Liang Lin

In the localization stage, the Scale-aware Multi-head Localization (SAML) module utilizes the query tensor to predict the confidence, location, and size of each potential object.

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

2 code implementations NeurIPS 2023 Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin

Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness.

ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion

1 code implementation11 Oct 2023 Jinghui Qin, Lihuang Fang, Ruitao Lu, Liang Lin, Yukai Shi

Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention.

Data Augmentation Diversity +1

Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation

1 code implementation23 Sep 2023 Tao Pu, Tianshui Chen, Hefeng Wu, Yongyi Lu, Liang Lin

In this work, we propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations.

Graph Generation Object +2

A Continual Learning Paradigm for Non-differentiable Visual Programming Frameworks on Visual Reasoning Tasks

no code implementations18 Sep 2023 Wentao Wan, Nan Kang, Zeqing Wang, Zhuojie Yang, Liang Lin, Keze Wang

Specifically, our CLVP distills the capabilities of well-trained task-specific models into the visual sub-modules in a stepwise and anti-forgetting manner.

Continual Learning Visual Reasoning

Towards Real-World Burst Image Super-Resolution: Benchmark and Method

1 code implementation ICCV 2023 Pengxu Wei, Yujing Sun, Xingbei Guo, Chang Liu, Jie Chen, Xiangyang Ji, Liang Lin

Despite substantial advances, single-image super-resolution (SISR) is always in a dilemma to reconstruct high-quality images with limited information from one input image, especially in realistic scenarios.

Burst Image Super-Resolution

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

2 code implementations23 Aug 2023 Ziyi Tang, Ruilin Wang, Weixing Chen, Yongsen Zheng, Zechuan Chen, Yang Liu, Keze Wang, Tianshui Chen, Liang Lin

Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators.

counterfactual Science Question Answering

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

no code implementations23 Aug 2023 Junyi Chen, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, Dongyu Zhang

Owing to the combination of the unified architecture and pre-training task, EVE is easy to scale up, enabling better downstream performance with fewer resources and faster training speed.

Image-text matching Image-text Retrieval +6

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

no code implementations ICCV 2023 Xujie Zhang, BinBin Yang, Michael C. Kampffmeyer, Wenqing Zhang, Shiyue Zhang, Guansong Lu, Liang Lin, Hang Xu, Xiaodan Liang

Cross-modal garment synthesis and manipulation will significantly benefit the way fashion designers generate garments and modify their designs via flexible linguistic interfaces. Current approaches follow the general text-to-image paradigm and mine cross-modal relations via simple cross-attention modules, neglecting the structural correspondence between visual and textual representations in the fashion design domain.

Attribute Constituency Parsing +2

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

no code implementations ICCV 2023 Haoyuan Li, Haoye Dong, Hanchao Jia, Dong Huang, Michael C. Kampffmeyer, Liang Lin, Xiaodan Liang

Multi-person 3D mesh recovery from videos is a critical first step towards automatic perception of group behavior in virtual reality, physical therapy and beyond.

Human Detection

Understanding Self-attention Mechanism via Dynamical System Perspective

no code implementations ICCV 2023 Zhongzhan Huang, Mingfu Liang, Jinghui Qin, Shanshan Zhong, Liang Lin

The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models.

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

1 code implementation ICCV 2023 Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin

Moreover, these methods ignore how to utilize the fine-grained dependencies among different skeleton joints to pre-train an efficient skeleton sequence learning model that can generalize well across different datasets.

Action Recognition Decoder +3

CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning

2 code implementations30 Jun 2023 Yang Liu, Weixing Chen, Guanbin Li, Liang Lin

We present CausalVLR (Causal Visual-Linguistic Reasoning), an open-source toolbox containing a rich set of state-of-the-art causal relation discovery and causal inference methods for various visual-linguistic reasoning tasks, such as VQA, image/video captioning, medical report generation, model generalization and robustness, etc.

Causal Inference Medical Report Generation +2

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning

no code implementations30 Jun 2023 Ganlong Zhao, Guanbin Li, Yipeng Qin, Jinjin Zhang, Zhenhua Chai, Xiaolin Wei, Liang Lin, Yizhou Yu

In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

1 code implementation23 Jun 2023 Jingyu Zhuang, Chen Wang, Lingjie Liu, Liang Lin, Guanbin Li

Neural fields have achieved impressive advancements in view synthesis and scene reconstruction.

3D scene Editing

DenseLight: Efficient Control for Large-scale Traffic Signals with Dense Feedback

1 code implementation13 Jun 2023 Junfan Lin, Yuying Zhu, Lingbo Liu, Yang Liu, Guanbin Li, Liang Lin

1) The travel time of a vehicle is delayed feedback on the effectiveness of TSC policy at each traffic intersection since it is obtained after the vehicle has left the road network.

Deep Reinforcement Learning Reinforcement Learning (RL) +1

Long-term Wind Power Forecasting with Hierarchical Spatial-Temporal Transformer

no code implementations30 May 2023 Yang Zhang, Lingbo Liu, Xinyu Xiong, Guanbin Li, Guoli Wang, Liang Lin

In this work, we propose a novel end-to-end wind power forecasting model named Hierarchical Spatial-Temporal Transformer Network (HSTTN) to address the long-term WPF problems.

Decoder

Identity-Preserving Talking Face Generation with Landmark and Appearance Priors

1 code implementation CVPR 2023 Weizhi Zhong, Chaowei Fang, Yinqi Cai, Pengxu Wei, Gangming Zhao, Liang Lin, Guanbin Li

Prior landmark characteristics of the speaker's face are employed to make the generated landmarks coincide with the facial outline of the speaker.

Talking Face Generation

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

1 code implementation9 May 2023 Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang Lin

Our approach can make text-to-image diffusion models easier to use with better user experience, which demonstrates our approach has the potential for further advancing the development of user-friendly text-to-image generation models by bridging the semantic gap between simple narrative prompts and complex keyword-based prompts.

Knowledge Distillation parameter-efficient fine-tuning +2

Visual Causal Scene Refinement for Video Question Answering

2 code implementations7 May 2023 Yushen Wei, Yang Liu, Hong Yan, Guanbin Li, Liang Lin

Our VCSR involves two essential modules: i) the Question-Guided Refiner (QGR) module, which refines consecutive video frames guided by the question semantics to obtain more representative segment features for causal front-door intervention; ii) the Causal Scene Separator (CSS) module, which discovers a collection of visual causal and non-causal scenes based on the visual-linguistic causal relevance and estimates the causal effect of the scene-separating intervention in a contrastive learning manner.

Contrastive Learning Question Answering +2

Multi-object Video Generation from Single Frame Layouts

no code implementations6 May 2023 Yang Wu, Zhibin Liu, Hefeng Wu, Liang Lin

In this paper, we study video synthesis with emphasis on simplifying the generation conditions.

Image Generation Object +2

ASR: Attention-alike Structural Re-parameterization

no code implementations13 Apr 2023 Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang Lin

This technique enables the mitigation of the extra costs for performance improvement during training, such as parameter size and inference time, through these transformations during inference, and therefore SRP has great potential for industrial and practical applications.

Open-World Pose Transfer via Sequential Test-Time Adaption

no code implementations20 Mar 2023 Junyang Chen, Xiaoyu Xian, Zhijing Yang, Tianshui Chen, Yongyi Lu, Yukai Shi, Jinshan Pan, Liang Lin

In open-world conditions, the pose transfer task raises various independent signals: OOD appearance and skeleton, which need to be extracted and distributed in speciality.

Motion Synthesis Person Re-Identification +1

Urban Regional Function Guided Traffic Flow Prediction

no code implementations17 Mar 2023 Kuo Wang, Lingbo Liu, Yang Liu, Guanbin Li, Fan Zhou, Liang Lin

The prediction of traffic flow is a challenging yet crucial problem in spatial-temporal analysis, which has recently gained increasing interest.

Prediction

Cross-Modal Causal Intervention for Medical Report Generation

2 code implementations16 Mar 2023 Weixing Chen, Yang Liu, Ce Wang, Jiarui Zhu, Shen Zhao, Guanbin Li, Cheng-Lin Liu, Liang Lin

Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance, which can relieve the heavy burden of radiologists by automatically generating the corresponding medical reports according to the given radiology image.

Medical Report Generation object-detection +1

Masked Images Are Counterfactual Samples for Robust Fine-tuning

1 code implementation CVPR 2023 Yao Xiao, Ziyi Tang, Pengxu Wei, Cong Liu, Liang Lin

In this paper, based on causal analysis of the aforementioned problems, we propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.

counterfactual

Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation

no code implementations13 Feb 2023 Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu

Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position.

Re-Ranking Vision-Language Navigation

AttNS: Attention-Inspired Numerical Solving For Limited Data Scenarios

1 code implementation5 Feb 2023 Zhongzhan Huang, Mingfu Liang, Shanshan Zhong, Liang Lin

We propose the attention-inspired numerical solver (AttNS), a concise method that helps the generalization and robustness issues faced by the AI-Hybrid numerical solver in solving differential equations due to limited data.

OccluMix: Towards De-Occlusion Virtual Try-on by Semantically-Guided Mixup

2 code implementations3 Jan 2023 Zhijing Yang, Junyang Chen, Yukai Shi, Hao Li, Tianshui Chen, Liang Lin

Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities.

Semantic Parsing Virtual Try-on

Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification

no code implementations2 Jan 2023 Ziyi Tang, Ruimao Zhang, Zhanglin Peng, Jinrui Chen, Liang Lin

We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages.

Attribute Representation Learning +1

Enhanced Soft Label for Semi-Supervised Semantic Segmentation

no code implementations ICCV 2023 Jie Ma, Chuan Wang, Yang Liu, Liang Lin, Guanbin Li

As a mainstream framework in the field of semi-supervised learning (SSL), self-training via pseudo labeling and its variants have witnessed impressive progress in semi-supervised semantic segmentation with the recent advance of deep neural networks.

Contrastive Learning Pseudo Label +1

A Retrospect to Multi-prompt Learning across Vision and Language

no code implementations ICCV 2023 Ziliang Chen, Xin Huang, Quanlong Guan, Liang Lin, Weiqi Luo

The vision community is undergoing the unprecedented progress with the emergence of Vision-Language Pretraining Models (VLMs).

Prompt Learning

RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels

no code implementations ICCV 2023 Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, Guanbin Li

Confidence-wise, we propose a novel sample selection strategy based on confidence representation voting instead of the widely-used small-loss criterion.

Learning with noisy labels Representation Learning +1

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

2 code implementations6 Dec 2022 Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, Xiaodan Liang

Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation.

Geometry Problem Solving Logical Reasoning +1

DreamArtist++: Controllable One-Shot Text-to-Image Generation via Positive-Negative Adapter

no code implementations21 Nov 2022 Ziyi Dong, Pengxu Wei, Liang Lin

To tackle this problem, we propose a simple yet effective framework, namely DreamArtist, which adopts a novel positive-negative prompt-tuning learning strategy on the pre-trained diffusion model, and it has shown to well handle the trade-off between the accurate controllability and fidelity of image generation with only one reference example.

Novel Concepts Text to Image Generation +1

Structure-Preserving 3D Garment Modeling with Neural Sewing Machines

no code implementations12 Nov 2022 Xipeng Chen, Guangrun Wang, Dizhong Zhu, Xiaodan Liang, Philip H. S. Torr, Liang Lin

In this paper, we propose a novel Neural Sewing Machine (NSM), a learning-based framework for structure-preserving 3D garment modeling, which is capable of learning representations for garments with diverse shapes and topologies and is successfully applied to 3D garment reconstruction and controllable manipulation.

Garment Reconstruction Representation Learning

Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning

2 code implementations12 Nov 2022 Ziyi Zhang, Weikai Chen, Hui Cheng, Zhen Li, Siyuan Li, Liang Lin, Guanbin Li

We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source-pretrained model is adapted to the target domain without access to the source data.

Contrastive Learning Source-Free Domain Adaptation

Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training

1 code implementation CVPR 2023 Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang Wen Chen

During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion.

Language Modelling Motion Generation +2

A Generic Shared Attention Mechanism for Various Backbone Neural Networks

no code implementations27 Oct 2022 Zhongzhan Huang, Senwei Liang, Mingfu Liang, Liang Lin

The self-attention mechanism has emerged as a critical component for improving the performance of various backbone neural networks.

Data Augmentation image-classification +4

Prompt-Matched Semantic Segmentation

no code implementations22 Aug 2022 Lingbo Liu, Jianlong Chang, Bruce X. B. Yu, Liang Lin, Qi Tian, Chang-Wen Chen

Previous methods usually fine-tuned the entire networks for each specific dataset, which will be burdensome to store massive parameters of these networks.

Prompt Learning Representation Learning +3

On Fast Simulation of Dynamical System with Neural Vector Enhanced Numerical Solver

1 code implementation7 Aug 2022 Zhongzhan Huang, Senwei Liang, Hong Zhang, Haizhao Yang, Liang Lin

The large-scale simulation of dynamical systems is critical in numerous scientific and engineering disciplines.

Computational Efficiency

Robust Real-World Image Super-Resolution against Adversarial Attacks

1 code implementation31 Jul 2022 Jiutao Yue, Haofeng Li, Pengxu Wei, Guanbin Li, Liang Lin

Since the frequency masking may not only destroys the adversarial perturbations but also affects the sharp details in a clean image, we further develop an adversarial sample classifier based on the frequency domain of images to determine if applying the proposed mask module.

Image Super-Resolution

Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering

2 code implementations26 Jul 2022 Yang Liu, Guanbin Li, Liang Lin

Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video.

Causal Inference Question Answering +2

The Lottery Ticket Hypothesis for Self-attention in Convolutional Neural Network

no code implementations16 Jul 2022 Zhongzhan Huang, Senwei Liang, Mingfu Liang, wei he, Haizhao Yang, Liang Lin

Recently many plug-and-play self-attention modules (SAMs) are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs).

Crowd Counting

Adversarially-Aware Robust Object Detector

1 code implementation13 Jul 2022 Ziyi Dong, Pengxu Wei, Liang Lin

In this work, we empirically explore the model training for adversarial robustness in object detection, which greatly attributes to the conflict between learning clean images and adversarial images.

Adversarial Robustness Object +2

Discourse-Aware Graph Networks for Textual Logical Reasoning

no code implementations4 Jul 2022 Yinya Huang, Lemao Liu, Kun Xu, Meng Fang, Liang Lin, Xiaodan Liang

In this work, we propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs).

graph construction Logical Reasoning +3

Real-World Image Super-Resolution by Exclusionary Dual-Learning

1 code implementation6 Jun 2022 Hao Li, Jinghui Qin, Zhijing Yang, Pengxu Wei, Jinshan Pan, Liang Lin, Yukai Shi

Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input, has recently received considerable attention with regard to its tremendous application potentials.

Diversity Image Restoration +1

Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

1 code implementation26 May 2022 Tao Pu, Tianshui Chen, Hefeng Wu, Yukai Shi, Zhijing Yang, Liang Lin

Specifically, an instance-perspective representation blending (IPRB) module is designed to blend the representations of the known labels in an image with the representations of the corresponding unknown labels in another image to complement these unknown labels.

image-classification Multi-Label Image Recognition +1

Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

1 code implementation23 May 2022 Tianshui Chen, Tao Pu, Lingbo Liu, Yukai Shi, Zhijing Yang, Liang Lin

Multi-label image recognition with partial labels (MLR-PL), in which some labels are known while others are unknown for each image, may greatly reduce the cost of annotation and thus facilitate large-scale MLR.

Multi-Label Image Recognition Multi-label Image Recognition with Partial Labels

LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

2 code implementations17 May 2022 Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Liang Lin, Xiaodan Liang

To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11, 495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation.

Math Math Word Problem Solving

Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

1 code implementation CVPR 2022 Xiaoqian Xu, Pengxu Wei, Weikai Chen, Mingzhi Mao, Liang Lin, Guanbin Li

To address this issue, we propose an unsupervised domain adaptation mechanism for real-world SR, named Dual ADversarial Adaptation (DADA), which only requires LR images in the target domain with available real paired data from a source camera.

Image Super-Resolution Unsupervised Domain Adaptation

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

1 code implementation CVPR 2022 BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.

Continual Learning Diversity +3

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

no code implementations26 Apr 2022 Yang Liu, Yushen Wei, Hong Yan, Guanbin Li, Liang Lin

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing.

Benchmarking Out-of-Distribution Generalization +2

Semantic Representation and Dependency Learning for Multi-Label Image Recognition

no code implementations8 Apr 2022 Tao Pu, Mingzhan Sun, Hefeng Wu, Tianshui Chen, Ling Tian, Liang Lin

We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions to regularize the network training.

Multi-Label Image Recognition Object +2

Open Set Domain Adaptation By Novel Class Discovery

no code implementations7 Mar 2022 Jingyu Zhuang, Ziliang Chen, Pengxu Wei, Guanbin Li, Liang Lin

In Open Set Domain Adaptation (OSDA), large amounts of target samples are drawn from the implicit categories that never appear in the source domain.

Domain Adaptation Novel Class Discovery

Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

1 code implementation26 Feb 2022 Pengxiang Yan, Ziyi Wu, Mengmeng Liu, Kun Zeng, Liang Lin, Guanbin Li

To relieve the burden of labor-intensive labeling, deep unsupervised SOD methods have been proposed to exploit noisy labels generated by handcrafted saliency methods.

object-detection Object Detection +2

Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning

1 code implementation CVPR 2022 Guangrun Wang, Yansong Tang, Liang Lin, Philip H.S. Torr

Inspired by perceptual learning that could use cross-view learning to perceive concepts and semantics, we propose a novel AE that could learn semantic-aware representation via cross-view image reconstruction.

Image Reconstruction Representation Learning +1

Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

1 code implementation21 Dec 2021 Tianshui Chen, Tao Pu, Hefeng Wu, Yuan Xie, Liang Lin

To reduce the annotation cost, we propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels, i. e., merely some labels are known while other labels are missing (also called unknown labels) per image.

Multi-Label Image Recognition Multi-label Image Recognition with Partial Labels

TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

2 code implementations7 Dec 2021 Yang Liu, Keze Wang, Lingbo Liu, Haoyuan Lan, Liang Lin

To overcome these limitations, we take advantage of the multi-scale temporal dependencies within videos and proposes a novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL), which jointly models the inter-snippet and intra-snippet temporal dependencies for temporal representation learning with a hybrid graph contrastive learning strategy.

Action Recognition Contrastive Learning +5

Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis

no code implementations27 Oct 2021 Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, Liang Lin

The integration of human parsing and appearance flow effectively guides the generation of video frames with realistic appearance.

Human Parsing Video Generation

Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation

no code implementations16 Oct 2021 Yang Wu, Shirui Feng, Guanbin Li, Liang Lin

PEMR includes a "looking ahead" process, \textit{i. e.} a visual feature extractor module that estimates feasible paths for gathering 3D navigational information, which is mimicking the human sense of direction.

Common Sense Reasoning Embodied Question Answering +1

Road Network Guided Fine-Grained Urban Traffic Flow Inference

1 code implementation29 Sep 2021 Lingbo Liu, Mengmeng Liu, Guanbin Li, Ziyi Wu, Junfan Lin, Liang Lin

Furthermore, we take the road network feature as a query to capture the long-range spatial distribution of traffic flow with a transformer architecture.

Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning

no code implementations ICCV 2021 Junkai Huang, Chaowei Fang, Weikai Chen, Zhenhua Chai, Xiaolin Wei, Pengxu Wei, Liang Lin, Guanbin Li

Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.

Binary Classification

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations9 Aug 2021 Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation23 Jul 2021 Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Vision and Language Navigation Vision-Language Navigation

Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks

1 code implementation ACL 2021 Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin

Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.

Decoder Math

Online Metro Origin-Destination Prediction via Heterogeneous Information Aggregation

1 code implementation2 Jul 2021 Lingbo Liu, Yuying Zhu, Guanbin Li, Ziyi Wu, Lei Bai, Liang Lin

In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e. g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership.

Time Series Analysis

Prototypical Graph Contrastive Learning

1 code implementation17 Jun 2021 Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Clustering Contrastive Learning +1

Towards Quantifiable Dialogue Coherence Evaluation

1 code implementation ACL 2021 Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang

To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards.

Coherence Evaluation Dialogue Evaluation +1

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

1 code implementation Findings (ACL) 2021 Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin

Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 4, 998 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems.

Math Mathematical Reasoning +1

Solving Inefficiency of Self-supervised Representation Learning

1 code implementation ICCV 2021 Guangrun Wang, Keze Wang, Guangcong Wang, Philip H. S. Torr, Liang Lin

In this paper, we reveal two contradictory phenomena in contrastive learning that we call under-clustering and over-clustering problems, which are major obstacles to learning efficiency.

Clustering Contrastive Learning +5

Joint Learning of Neural Transfer and Architecture Adaptation for Image Recognition

no code implementations31 Mar 2021 Guangrun Wang, Liang Lin, Rongcong Chen, Guangcong Wang, Jiqi Zhang

In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness, compared to the existing image recognition pipeline that only tunes the weights regardless of the architecture.

Age Estimation image-classification +6

Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer

2 code implementations26 Jan 2021 Liang Lin, Yiming Gao, Ke Gong, Meng Wang, Xiaodan Liang

Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e. g., sharing discrepant label granularity) without extensive re-training.

 Ranked #1 on Human Parsing on 4D-DRESS (using extra training data)

Graph Representation Learning Human Parsing +2

Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

no code implementations9 Jan 2021 Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin

Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings.

Retrieval Sentence

Temporal Contrastive Graph Learning for Video Action Recognition and Retrieval

no code implementations4 Jan 2021 Yang Liu, Keze Wang, Haoyuan Lan, Liang Lin

To model multi-scale temporal dependencies, our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i. e., the intra-/inter- snippet temporal contrastive graphs.

Action Recognition Contrastive Learning +5

Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations1 Jan 2021 Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

Towards a Reliable and Robust Dialogue System for Medical Automatic Diagnosis

no code implementations1 Jan 2021 Junfan Lin, Lin Xu, Ziliang Chen, Liang Lin

To this end, we propose a novel DSMAD agent, INS-DS (Introspective Diagnosis System) comprising of two separate yet cooperative modules, i. e., an inquiry module for proposing symptom-inquiries and an introspective module for deciding when to inform a disease.

Decision Making Diagnostic

Adversarial Training using Contrastive Divergence

no code implementations1 Jan 2021 Hongjun Wang, Guanbin Li, Liang Lin

To protect the security of machine learning models against adversarial examples, adversarial training becomes the most popular and powerful strategy against various adversarial attacks by injecting adversarial examples into training data.

Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering

no code implementations ICCV 2021 Qingxing Cao, Wentao Wan, Keze Wang, Xiaodan Liang, Liang Lin

The experimental results show that our proposed method can improve current VQA models on OOD split without losing performance on the in-domain test data.

Novel Concepts Question Answering +1

CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy Temperature

no code implementations1 Jan 2021 Junfan Lin, Changxin Huang, Xiaodan Liang, Liang Lin

The curiosity is added to the target entropy to increase the entropy temperature for unfamiliar states and decrease the target entropy for familiar states.

MuJoCo Reinforcement Learning (RL)

AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

1 code implementation29 Dec 2020 Tao Pu, Tianshui Chen, Yuan Xie, Hefeng Wu, Liang Lin

In this work, we explore the correlations among the action units and facial expressions, and devise an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition.

Facial Expression Recognition Facial Expression Recognition (FER) +1

REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

no code implementations24 Dec 2020 Yinya Huang, Meng Fang, Xunlin Zhan, Qingxing Cao, Xiaodan Liang, Liang Lin

It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance.

Question Answering World Knowledge

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

1 code implementation22 Dec 2020 Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin

Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.

Diagnostic Dialogue Generation +1

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

1 code implementation14 Dec 2020 Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin

Specifically, we generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs to disentangle the knowledge from other biases.

Question Answering Visual Question Answering

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

1 code implementation30 Nov 2020 Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency.

continuous-control Continuous Control +3

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

2 code implementations NeurIPS 2020 Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm.

Instance Segmentation Panoptic Segmentation +2

A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning

no code implementations15 Oct 2020 Hongjun Wang, Guanbin Li, Xiaobai Liu, Liang Lin

Although deep convolutional neural networks (CNNs) have demonstrated remarkable performance on multiple computer vision tasks, researches on adversarial learning have shown that deep models are vulnerable to adversarial examples, which are crafted by adding visually imperceptible perturbations to the input images.

Adversarial Attack

Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems

1 code implementation EMNLP 2020 Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, Liang Lin

A practical automatic textual math word problems (MWPs) solver should be able to solve various textual MWPs while most existing works only focused on one-unknown linear MWPs.

Decoder Math +1

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

1 code implementation EMNLP 2020 Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang

Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.

Dialogue Evaluation

Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

no code implementations20 Sep 2020 Tianshui Chen, Liang Lin, Riquan Chen, Xiaolu Hui, Hefeng Wu

The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples.

Few-Shot Learning Multi-Label Image Recognition +1

Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos

no code implementations18 Sep 2020 Jie Wu, Guanbin Li, Xiaoguang Han, Liang Lin

Temporal grounding of natural language in untrimmed videos is a fundamental yet challenging multimedia task facilitating cross-media visual content retrieval.

cross-modal alignment reinforcement-learning +3

Online Alternate Generator against Adversarial Attacks

no code implementations17 Sep 2020 Haofeng Li, Yirui Zeng, Guanbin Li, Liang Lin, Yizhou Yu

The field of computer vision has witnessed phenomenal progress in recent years partially due to the development of deep convolutional neural networks.

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition

1 code implementation1 Sep 2020 Yang Liu, Keze Wang, Guanbin Li, Liang Lin

In this paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors.

Action Recognition Image Generation +3

Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from Cross View and Each View

no code implementations23 Aug 2020 Junpeng Tan, Yukai Shi, Zhijing Yang, Caizhen Wen, Liang Lin

To ensure that we achieve effective sparse representation and clustering performance on the original data matrix, adaptive graph regularization and unsupervised clustering constraints are also incorporated in the proposed model to preserve the internal structural features of the data.

Clustering

Component Divide-and-Conquer for Real-World Image Super-Resolution

1 code implementation ECCV 2020 Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, WangMeng Zuo, Liang Lin

Learning an SR model with conventional pixel-wise loss usually is easily dominated by flat regions and edges, and fails to infer realistic details of complex textures.

Image Super-Resolution

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

1 code implementation3 Aug 2020 Yuan Xie, Tianshui Chen, Tao Pu, Hefeng Wu, Liang Lin

However, most of these works focus on holistic feature adaptation, and they ignore local features that are more transferable across different datasets.

Cross-Domain Facial Expression Recognition Facial Expression Recognition (FER)

Fine-Grained Image Captioning with Global-Local Discriminative Objective

1 code implementation21 Jul 2020 Jie Wu, Tianshui Chen, Hefeng Wu, Zhi Yang, Guangchun Luo, Liang Lin

This is primarily due to (i) the conservative characteristic of traditional training objectives that drives the model to generate correct but hardly discriminative captions for similar images and (ii) the uneven word distribution of the ground-truth captions, which encourages generating highly frequent words/phrases while suppressing the less frequent but more concrete ones.

Descriptive Image Captioning +2

Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning

no code implementations24 Apr 2020 Zhongzhan Huang, Wenqi Shao, Xinjiang Wang, Liang Lin, Ping Luo

Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters.

Bidirectional Graph Reasoning Network for Panoptic Segmentation

no code implementations CVPR 2020 Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.

Instance Segmentation Panoptic Segmentation +1

Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking

1 code implementation CVPR 2020 Hongjun Wang, Guangrun Wang, Ya Li, Dongyu Zhang, Liang Lin

To examine the robustness of ReID systems is rather important because the insecurity of ReID systems may cause severe losses, e. g., the criminals may use the adversarial perturbations to cheat the CCTV systems.

Adversarial Attack Person Re-Identification

Efficient Crowd Counting via Structured Knowledge Transfer

2 code implementations23 Mar 2020 Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang Lin

Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.

Crowd Counting Transfer Learning

Linguistically Driven Graph Capsule Network for Visual Question Reasoning

no code implementations23 Mar 2020 Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin

Inspired by the property of a capsule network that can carve a tree structure inside a regular convolutional neural network (CNN), we propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network", where the compositional process is guided by the linguistic parse tree.

Question Answering Visual Question Answering

Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis

1 code implementation14 Mar 2020 Junfan Lin, Keze Wang, Ziliang Chen, Xiaodan Liang, Liang Lin

To eliminate this bias and inspired by the propensity score matching technique with causal diagram, we propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records; Bias (ii) inherently comes along with the passively collected data, and is one of the key obstacles for training the agent towards "learning how" rather than "remembering what".

Diagnostic Medical Diagnosis +1

DDet: Dual-path Dynamic Enhancement Network for Real-World Image Super-Resolution

1 code implementation25 Feb 2020 Yukai Shi, Haoyu Zhong, Zhijing Yang, Xiaojun Yang, Liang Lin

Previous image SR methods fail to exhibit similar performance on Real-SR as the image data is not aligned inherently.

Image Super-Resolution

Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread

no code implementations22 Jan 2020 Haofeng Li, Guanbin Li, BinBin Yang, Guanqi Chen, Liang Lin, Yizhou Yu

The proposed algorithm for the first time achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.

image-classification Image Classification +5

Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction

2 code implementations14 Jan 2020 Lingbo Liu, Jingwen Chen, Hefeng Wu, Jiajie Zhen, Guanbin Li, Liang Lin

To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs.

Representation Learning

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

no code implementations18 Dec 2019 Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin

In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations.

Position Segmentation +2

Knowledge Graph Transfer Network for Few-Shot Recognition

1 code implementation21 Nov 2019 Riquan Chen, Tianshui Chen, Xiaolu Hui, Hefeng Wu, Guanbin Li, Liang Lin

In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN).

Few-Shot Image Classification Few-Shot Learning +2

A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models

no code implementations31 Oct 2019 Yang Wu, Pengxu Wei, Liang Lin

To solve this problem, we derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.

Layout-Graph Reasoning for Fashion Landmark Detection

no code implementations CVPR 2019 Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin

Each Layout-Graph Reasoning(LGR) layer aims to map feature representations into structural graph nodes via a Map-to-Node module, performs reasoning over structural graph nodes to achieve global layout coherency via a layout-graph reasoning module, and then maps graph nodes back to enhance feature representations via a Node-to-Map module.

Attribute Clustering +1

Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning

no code implementations28 Sep 2019 Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin

Resembling the rapid learning capability of human, few-shot learning empowers vision systems to understand new concepts by training with few samples.

Few-Shot Learning Few-Shot Object Detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.