Search Results for author: Bin Li

Found 339 papers, 115 papers with code

VPAI_Lab at MedVidQA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification

1 code implementation BioNLP (ACL) 2022 Bin Li, Yixuan Weng, Fei Xia, Bin Sun, Shutao Li

Given an input video, the MedVidCL task aims to correctly classify it into one of three following categories: Medical Instructional, Medical Non-instructional, and Non-medical.

Video Classification

The First International Ancient Chinese Word Segmentation and POS Tagging Bakeoff: Overview of the EvaHan 2022 Evaluation Campaign

no code implementations LT4HALA (LREC) 2022 Bin Li, Yiguo Yuan, Jingya Lu, Minxuan Feng, Chao Xu, Weiguang Qu, Dongbo Wang

This paper presents the results of the First Ancient Chinese Word Segmentation and POS Tagging Bakeoff (EvaHan), which was held at the Second Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) 2022, in the context of the 13th Edition of the Language Resources and Evaluation Conference (LREC 2022).

Chinese Word Segmentation POS +2

Align-smatch: A Novel Evaluation Method for Chinese Abstract Meaning Representation Parsing based on Alignment of Concept and Relation

no code implementations LREC 2022 Liming Xiao, Bin Li, Zhixing Xu, Kairui Huo, Minxuan Feng, Junsheng Zhou, Weiguang Qu

Therefore, to make up for the vacancy of Chinese AMR parsing evaluation methods, based on AMR evaluation metric smatch, we have improved the algorithm of generating triples so that to make it compatible with concept alignment and relation alignment.

Abstract Meaning Representation AMR Parsing +3

DynGL-SDP: Dynamic Graph Learning for Semantic Dependency Parsing

1 code implementation COLING 2022 Bin Li, Miao Gao, Yunlong Fan, Yikemaiti Sataer, Zhiqiang Gao, Yaocheng Gui

A recent success in semantic dependency parsing shows that graph neural networks can make significant accuracy improvements, owing to its powerful ability in learning expressive graph representations.

Dependency Parsing graph construction +3

Continuing Pre-trained Model with Multiple Training Strategies for Emotional Classification

no code implementations WASSA (ACL) 2022 Bin Li, Yixuan Weng, Qiya Song, Bin Sun, Shutao Li

This paper describes the contribution of the LingJing team’s method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Emotion Classification.

Attribute Classification +5

A Knowledge storage and semantic space alignment Method for Multi-documents dialogue generation

no code implementations dialdoc (ACL) 2022 Minjun Zhu, Bin Li, Yixuan Weng, Fei Xia

Question Answering (QA) is a Natural Language Processing (NLP) task that can measure language and semantics understanding ability, it requires a system not only to retrieve relevant documents from a large number of articles but also to answer corresponding questions according to documents.

Dialogue Generation Language Modeling +4

基于大规模语料库的《古籍汉字分级字表》研究(The Formulation of The graded Chinese character list of ancient books Based on Large-scale Corpus)

no code implementations CCL 2021 Changwei Xu, Minxuan Feng, Bin Li, Yiguo Yuan

"《古籍汉字分级字表》是基于大规模古籍文本语料库、为辅助学习者古籍文献阅读而研制的分级字表。该字表填补了古籍字表研究成果的空缺, 依据各汉字学习优先级别的不同, 实现了古籍汉字的等级划分, 目前收录一级字105个, 二级字340个, 三级字555个。本文介绍了该字表研制的主要依据和基本步骤, 并将其与传统识字教材“三百千”及《现代汉语常用字表》进行比较, 验证了其收字的合理性。该字表有助于学习者优先掌握古籍文本常用字, 提升古籍阅读能力, 从而促进中华优秀传统文化的继承与发展。”

先秦词网构建及梵汉对比研究(The Construction of Pre-Qin Ancient Chinese WordNet and Cross Language Comparative Study between Ancient Sanskrit WordNet and Pre-Qin Ancient Chinese WordNet)

no code implementations CCL 2021 Xuehui Lu, Huidan Xu, Siyu Chen, Bin Li

“先秦汉语在汉语史研究上具有重要地位, 然而以往的研究始终没有形成结构化的先秦词汇资源, 难以满足古汉语信息处理和跨语言对比的研究需要。国际上以英文词网(WordNet)的义类架构为基础, 已经建立了数十种语言的词网, 已经成为多语言自然语言处理和跨语言对比的基础资源。本文综述了国内外各种词网的构建情况, 特别是古代语言的词网和汉语词网, 然后详细介绍了先秦词网的构建和校正过程, 构建起了涵盖43591个词语、61227个义项、17975个义类的先秦汉语词网。本文还通过与古梵语词网的跨语言对比, 尝试分析这两种古老语言在词汇上的共性和差异, 初步验证先秦词网的有效性。”

中文词语离合现象识别研究(Research on Recognition of the Separation and Reunion Phenomena of Words in Chinese)

no code implementations CCL 2021 Lou Zhou, Weiguang Qu, Tingxin Wei, Junsheng Zhou, Bin Li, Yanhui Gu

“汉语词语的离合现象是汉语中一种词语可分可合的特殊现象。本文采用字符级序列标注方法解决二字动词离合现象的自动识别问题, 以避免中文分词及词性标注的错误传递, 节省制定匹配规则与特征模板的人工开支。在训练过程中微调BERT中文预训练模型, 获取面向目标任务的字符向量表示, 并引入掩码机制对模型隐藏离用法中分离的词语, 减轻词语本身对识别结果的影响, 强化中间插入成分的学习, 并对前后语素采用不同的掩码以强调其出现顺序, 进而使模型具备了识别复杂及偶发性离用法的能力。为获得含有上下文信息的句子表达, 将原始的句子表达与采用掩码的句子表达分别输入两个不同参数的BiLSTM层进行训练, 最后采用CRF算法捕捉句子标签序列的依赖关系。本文提出的BERT MASK + 2BiLSTMs + CRF模型比现有最优的离合词识别模型提高了2. 85%的F1值。”

中文连动句语义关系识别研究(Research on Semantic Relation Recognition of Chinese Serial-verb Sentences)

no code implementations CCL 2021 Chao Sun, Weiguang Qu, Tingxin Wei, Yanhui Gu, Bin Li, Junsheng Zhou

“连动句是形如“NP+VP1+VP2”的句子, 句中含有两个或两个以上的动词(或动词结构)且动词的施事为同一对象。相同结构的连动句可以表示多种不同的语义关系。本文基于前人对连动句中VP1和VP2之间的语义关系分类, 标注了连动句语义关系数据集, 基于神经网络完成了对连动句语义关系的识别。该方法将连动句语义识别任务进行分解, 基于BERT进行编码, 利用BiLSTM-CRF先识别出连动句中连动词(VP)及其主语(NP), 再基于融合连动词信息的编码, 利用BiLSTM-Attention对连动词进行关系判别, 实验结果验证了所提方法的有效性。”

多轮对话的篇章级抽象语义表示标注体系研究(Research on Discourse-level Abstract Meaning Representation Annotation framework in Multi-round Dialogue)

no code implementations CCL 2020 Tong Huang, Bin Li, Peiyi Yan, Tingting Ji, Weiguang Qu

对话分析是智能客服、聊天机器人等自然语言对话应用的基础课题, 而对话语料与常规书面语料有较大差异, 存在大量的称谓、情感短语、省略、语序颠倒、冗余等复杂现象, 对句法和语义分析器的影响较大, 对话自动分析的准确率相对书面语料一直不高。其主要原因在于对多轮对话缺乏严整的形式化描写方式, 不利于后续的分析计算。因此, 本文在梳理国内外针对对话的标注体系和语料库的基础上, 提出了基于抽象语义表示的篇章级多轮对话标注体系。具体探讨了了篇章级别的语义结构标注方法, 给出了词语和概念关系的对齐方案, 针对称谓语和情感短语增加了相应的语义关系和概念, 调整了表示主观情感词语的论元结构, 并对对话中一些特殊现象进行了规定, 设计了人工标注平台, 为大规模的多轮对话语料库标注与计算研究奠定基础。

Abstract Meaning Representation

基于深度学习的实体关系抽取研究综述(Review of Entity Relation Extraction based on deep learning)

no code implementations CCL 2020 Zhentao Xia, Weiguang Qu, Yanhui Gu, Junsheng Zhou, Bin Li

作为信息抽取的一项核心子任务, 实体关系抽取对于知识图谱、智能问答、语义搜索等自然语言处理应用都十分重要。关系抽取在于从非结构化文本中自动地识别实体之间具有的某种语义关系。该文聚焦句子级别的关系抽取研究, 介绍用于关系抽取的主要数据集并对现有的技术作了阐述, 主要分为:有监督的关系抽取、远程监督的关系抽取和实体关系联合抽取。我们对比用于该任务的各种模型, 分析它们的贡献与缺 陷。最后介绍中文实体关系抽取的研究现状和方法。

Relation Extraction

基于神经网络的连动句识别(Recognition of serial-verb sentences based on Neural Network)

no code implementations CCL 2020 Chao Sun, Weiguang Qu, Tingxin Wei, Yanhui Gu, Bin Li, Junsheng Zhou

连动句是具有连动结构的句子, 是汉语中的特殊句法结构, 在现代汉语中十分常见且使用频繁。连动句语法结构和语义关系都很复杂, 在识别中存在许多问题, 对此本文针对连动句的识别问题进行了研究, 提出了一种基于神经网络的连动句识别方法。本方法分两步:第一步, 运用简单的规则对语料进行预处理;第二步, 用文本分类的思想, 使用BERT编码, 利用多层CNN与BiLSTM模型联合提取特征进行分类, 进而完成连动句识别任务。在人工标注的语料上进行实验, 实验结果达到92. 71%的准确率, F1值为87. 41%。

基于抽象语义表示的汉语疑问句的标注与分析(Chinese Interrogative Sentences Annotation and Analysis Based on the Abstract Meaning Representation)

no code implementations CCL 2020 Peiyi Yan, Bin Li, Tong Huang, Kairui Huo, Jin Chen, Weiguang Qu

疑问句的句法语义分析在搜索引擎、信息抽取和问答系统等领域有着广泛的应用。计算语言学多采取问句分类和句法分析相结合的方式来处理疑问句, 精度和效率还不理想。而疑问句的语言学研究成果丰富, 比如疑问句的结构类型、疑问焦点和疑问代词的非疑问用法等, 但缺乏系统的形式化表示。本文致力于解决这一难题, 采用基于图结构的汉语句子语义的整体表示方法—中文抽象语义表示(CAMR)来标注疑问句的语义结构, 将疑问焦点和整句语义一体化表示出来。然后选取了宾州中文树库CTB8. 0网络媒体语料、小学语文教材以及《小王子》中文译本的2万句语料中共计2071句疑问句, 统计了疑问句的主要特点。统计表明, 各种疑问代词都可以通过疑问概念amr-unknown和语义关系的组合来表示, 能够完整地表示出疑问句的关键信息、疑问焦点和语义结构。最后, 根据疑问代词所关联的语义关系, 统计了疑问焦点的概率分布, 其中原因、修饰语和受事的占比最高, 分别占26. 53%、16. 73%以及16. 44%。基于抽象语义表示的疑问句标注与分析可以为汉语疑问句研究提供基础理论与资源。

Abstract Meaning Representation

Building a Chinese AMR Bank with Concept and Relation Alignments

no code implementations LILT 2019 Bin Li, Yuan Wen, Li Song, Weiguang Qu, Nianwen Xue

One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation.

Abstract Meaning Representation Relation +1

Knowledge Transfer with Visual Prompt in multi-modal Dialogue Understanding and Generation

no code implementations TU (COLING) 2022 Minjun Zhu, Yixuan Weng, Bin Li, Shizhu He, Kang Liu, Jun Zhao

In this work, we propose a knowledge transfer method with visual prompt (VPTG) fusing multi-modal data, which is a flexible module that can utilize the text-only seq2seq model to handle visual dialogue tasks.

Dialogue Understanding Knowledge Distillation +2

Event Signal Filtering via Probability Flux Estimation

no code implementations10 Apr 2025 Jinze Chen, Wei Zhai, Yang Cao, Bin Li, Zheng-Jun Zha

The state and process information within events is modeled as continuous probability flux at threshold boundaries of the underlying irradiance diffusion.

State Space Models Super-Resolution +1

Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion

no code implementations4 Apr 2025 Junkai Zhang, Bin Li, Shoujun Zhou, Yue Du

This study provides an effective pathway for hierarchical visual question answering systems, advancing medical image understanding.

Diagnostic Medical Visual Question Answering +2

Optimal Insurance in a Monopoly: Dual Utilities with Hidden Risk Attitudes

no code implementations1 Apr 2025 Mario Ghossoub, Bin Li, Benxuan Shi

Notably, insurance coverage and premia are monotone in the level of risk aversion; the most risk-averse consumer receives full insurance $(\textit{efficiency at the top})$; the monopoly absorbs all surplus from the least-risk averse consumer; and consumers with a higher level of risk aversion induce a higher expected profit for the insurer.

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

no code implementations24 Mar 2025 Bin Li, Dehong Gao, Yeyuan Wang, Linbo Jin, Shanqing Yu, Xiaoyan Cai, Libin Yang

Despite the significant success of Large Vision-Language models(LVLMs), these models still suffer hallucinations when describing images, generating answers that include non-existent objects.

MME TextVQA

PHT-CAD: Efficient CAD Parametric Primitive Analysis with Progressive Hierarchical Tuning

1 code implementation23 Mar 2025 Ke Niu, YuWen Chen, Haiyang Yu, Zhuofan Chen, Xianghui Que, Bin Li, xiangyang xue

Additionally, we propose PHT-CAD, a novel 2D PPA framework that harnesses the modality alignment and reasoning capabilities of Vision-Language Models (VLMs) for precise engineering drawing analysis.

ARC

Multi-modal Multi-platform Person Re-Identification: Benchmark and Method

no code implementations21 Mar 2025 Ruiyang Ha, Songyi Jiang, Bin Li, Bikang Pan, Yihang Zhu, Junjie Zhang, Xiatian Zhu, Shaogang Gong, Jingya Wang

To address these challenges, we introduce the MP-ReID benchmark, a novel dataset designed specifically for multi-modality and multi-platform ReID.

Person Re-Identification

UMIT: Unifying Medical Imaging Tasks via Vision-Language Models

1 code implementation20 Mar 2025 Haiyang Yu, Siyang Yi, Ke Niu, Minghan Zhuo, Bin Li

In addition, it is applicable to multiple imaging modalities (e. g., X-ray, CT and PET), covering a wide range of applications from basic diagnostics to complex lesion analysis.

Diagnostic Medical Image Analysis +3

DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

no code implementations19 Mar 2025 Jianbo Zhao, Taiyu Ban, Zhihao Liu, Hangning Zhou, Xiyang Wang, Qibin Zhou, Hailong Qin, Mu Yang, Lei Liu, Bin Li

We theoretically analyze DRoPE's correctness and efficiency, demonstrating its capability to simultaneously optimize trajectory generation accuracy, time complexity, and space complexity.

Autonomous Driving Position

Adaptive Mixture of Experts Learning for Robust Audio Spoofing Detection

no code implementations15 Mar 2025 Qixian Chen, Yuxiong Xu, Sara Mandelli, Sheng Li, Bin Li

In audio spoofing detection, most studies rely on clean datasets, making models susceptible to real-world post-processing attacks, such as channel compression and noise.

Mixture-of-Experts

Proxy-Tuning: Tailoring Multimodal Autoregressive Models for Subject-Driven Image Generation

no code implementations13 Mar 2025 Yi Wu, Lingting Zhu, Lei Liu, Wandi Qiao, Ziqiang Li, Lequan Yu, Bin Li

Multimodal autoregressive (AR) models, based on next-token prediction and transformer architecture, have demonstrated remarkable capabilities in various multimodal tasks including text-to-image (T2I) generation.

Image Generation

VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models

no code implementations8 Mar 2025 Xinan He, Yue Zhou, Bing Fan, Bin Li, Guopu Zhu, Feng Ding

In this work, we integrate Multimodal Large Language Models (MLLMs) within DM-based face forensics, and propose a fine-grained analysis triad framework called VLForgery, that can 1) predict falsified facial images; 2) locate the falsified face regions subjected to partial synthesis; and 3) attribute the synthesis with specific generators.

Attribute DeepFake Detection +2

Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs

1 code implementation5 Mar 2025 Haoran Fan, Bin Li, Yixuan Weng, Shoujun Zhou

By redefining the efficiency-accuracy trade-off landscape, this work establishes SLMs as viable alternatives to resource-intensive LLMs for practical time series forecasting.

Computational Efficiency Descriptive +3

DLF: Extreme Image Compression with Dual-generative Latent Fusion

no code implementations3 Mar 2025 Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu

DLF decomposes the latent into semantic and detail elements, compressing them through two distinct branches.

Image Compression

Towards Practical Real-Time Neural Video Compression

1 code implementation28 Feb 2025 Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, Yan Lu

In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational operational costs, such as memory I/O and the number of function calls.

Video Compression

ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models

no code implementations27 Feb 2025 Ke Niu, Haiyang Yu, Mengyang Zhao, Teng Fu, Siyang Yi, Wei Lu, Bin Li, Xuelin Qian, xiangyang xue

Person re-identification (Re-ID) is a critical task in human-centric intelligent systems, enabling consistent identification of individuals across different camera views using multi-modal query information.

Person Re-Identification Person Retrieval +1

Accurate and Scalable Graph Neural Networks via Message Invariance

1 code implementation27 Feb 2025 Zhihao Shi, Jie Wang, Zhiwei Zhuang, Xize Liang, Bin Li, Feng Wu

Message passing-based graph neural networks (GNNs) have achieved great success in many real-world applications.

Transductive Learning

Measuring trade costs and analyzing the determinants of trade growth between Cambodia and major trading partners: 1993 to 2019

no code implementations26 Feb 2025 Borin Keo, Bin Li, Waqas Younis

This study aims to measure trade costs and explore the driving forces behind the growth of bilateral trade between Cambodia and its top 30 trading partners from 1993 to 2019.

VesselSAM: Leveraging SAM for Aortic Vessel Segmentation with LoRA and Atrous Attention

1 code implementation25 Feb 2025 Adnan Iltaf, Rayan Merghani Ahmed, Zhenxi Zhang, Bin Li, Shoujun Zhou

Medical image segmentation is crucial for clinical diagnosis and treatment planning, especially when dealing with complex anatomical structures such as vessels.

Computational Efficiency Image Segmentation +3

AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification

no code implementations17 Feb 2025 Xiaoyu Tan, Tianchu Yao, Chao Qu, Bin Li, Minghao Yang, Dakuan Lu, Haozhe Wang, Xihe Qiu, Wei Chu, Yinghui Xu, Yuan Qi

In this paper, we present AURORA, a novel automated framework for training universal process reward models (PRMs) using ensemble prompting and reverse verification.

Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition

1 code implementation17 Feb 2025 Jianyi Peng, Fan Lu, Bin Li, Yuan Huang, Sanqing Qu, Guang Chen

Compared to single-modal VPR, this approach benefits from the widespread availability of RGB cameras and the robustness of point clouds in providing accurate spatial geometry and distance information.

Re-Ranking Triplet +1

ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images

1 code implementation9 Feb 2025 Hongyu Ge, Longkun Hao, Zihui Xu, Zhenxin Lin, Bin Li, Shoujun Zhou, Hongjin Zhao, Yihang Liu

To address these issues, we introduce the Cross-Modal Clinical Knowledge Distiller (ClinKD), an innovative framework designed to enhance image-text alignment and establish more effective medical knowledge adaptation mechanisms, which enables MLLMs to adapt to medical knowledge.

Clinical Knowledge Medical Visual Question Answering +2

AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference

1 code implementation6 Feb 2025 Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Chen Chen, Lei Chen, Xianzhi Yu, Wulong Liu, Jianye Hao, Mingxuan Yuan, Bin Li

With the development of large language models (LLMs), efficient inference through Key-Value (KV) cache compression has attracted considerable attention, especially for long-context generation.

LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models

1 code implementation4 Feb 2025 Jiangong Chen, Xiaoyi Wu, Tian Lan, Bin Li

Unlike prior approaches focusing on coding script generation, LLMER translates natural language inputs into JSON data, significantly reducing the likelihood of application crashes and processing latency.

Script Generation

VQLTI: Long-Term Tropical Cyclone Intensity Forecasting with Physical Constraints

1 code implementation30 Jan 2025 Xinyu Wang, Lei Liu, Kang Chen, Tao Han, Bin Li, Lei Bai

(2) Incorporating physical knowledge and physical constraints can help mitigate the accumulation of forecasting errors.

Tropical Cyclone Intensity Forecasting

Exploratory Mean-Variance Portfolio Optimization with Regime-Switching Market Dynamics

no code implementations28 Jan 2025 Yuling Max Chen, Bin Li, David Saunders

In a real market data study, EMVRS with OC learning outperforms its counterparts with the highest mean and reasonably low volatility of the annualized portfolio returns.

Portfolio Optimization Reinforcement Learning (RL)

Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models

no code implementations24 Jan 2025 Yuxuan Liang, Xu Li, Xiaolei Chen, Haotian Chen, Yi Zheng, Chenghang Lai, Bin Li, xiangyang xue

As the demand for high-resolution image processing in Large Vision-Language Models (LVLMs) grows, sub-image partitioning has become a popular approach for mitigating visual information loss associated with fixed-resolution processing.

Pre-Trained Large Language Model Based Remaining Useful Life Transfer Prediction of Bearing

no code implementations13 Jan 2025 Laifa Tao, Zhengduo Zhao, Xuesong Wang, Bin Li, Wenchao Zhan, Xuanyuan Su, Shangyu Li, Qixuan Huang, Haifei Liu, Chen Lu, Zhixuan Lian

Accurately predicting the remaining useful life (RUL) of rotating machinery, such as bearings, is essential for ensuring equipment reliability and minimizing unexpected industrial failures.

Language Modeling Language Modelling +1

Lossless Privacy-Preserving Aggregation for Decentralized Federated Learning

no code implementations8 Jan 2025 Xiaoye Miao, Bin Li, Yangyang Wu, Meng Xi, Xinkui Zhao, Jianwei Yin

In this paper, we propose a novel lossless privacy-preserving aggregation rule named LPPA to enhance gradient protection as much as possible but without loss of DFL model predictive accuracy.

Federated Learning Privacy Preserving

Gradient Purification: Defense Against Poisoning Attack in Decentralized Federated Learning

no code implementations8 Jan 2025 Bin Li, Xiaoye Miao, Yongheng Shang, Xinkui Zhao, Shuiguang Deng, Jianwei Yin

It aims to mitigate the harm in model gradients while retaining the benefit in model weights for enhancing accuracy.

Federated Learning

On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit

no code implementations1 Jan 2025 Xiaoyi Wu, Bo Ji, Bin Li

By setting $M$ to a constant, the number of comparison steps in the pessimistic-optimistic algorithm can be reduced to a constant, thereby significantly reducing the computational complexity.

Fairness

Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models

no code implementations26 Dec 2024 Xu Li, Yi Zheng, Haotian Chen, Xiaolei Chen, Yuxuan Liang, Chenghang Lai, Bin Li, xiangyang xue

Our findings reveal that multilayer features provide complementary strengths with varying task dependencies, and uniform fusion leads to suboptimal performance.

When Large Vision-Language Models Meet Person Re-Identification

no code implementations27 Nov 2024 Qizao Wang, Bin Li, xiangyang xue

Large Vision-Language Models (LVLMs) that incorporate visual models and Large Language Models (LLMs) have achieved impressive results across various cross-modal understanding and reasoning tasks.

Person Re-Identification

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors

no code implementations26 Nov 2024 Ziang Xu, Bin Li, Yang Hu, Chenyu Zhang, James East, Sharib Ali, Jens Rittscher

Accurate 3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract, requiring reliable depth and pose estimation.

Pose Estimation

A Quality-Centric Framework for Generic Deepfake Detection

no code implementations8 Nov 2024 Wentang Song, Zhiyuan Yan, Yuzhen Lin, Taiping Yao, Changsheng chen, Shen Chen, Yandan Zhao, Shouhong Ding, Bin Li

To tackle this issue, we propose a novel quality-centric framework for generic deepfake detection, which is composed of a Quality Evaluator, a low-quality data enhancement module, and a learning pacing strategy that explicitly incorporates forgery quality into the training process.

Data Augmentation DeepFake Detection +1

Learning to Unify Audio, Visual and Text for Audio-Enhanced Multilingual Visual Answer Localization

no code implementations5 Nov 2024 Zhibin Wen, Bin Li

Specifically, we integrate features from three modalities and develop three predictors, each tailored to the unique contributions of the fused modalities: an audio-visual predictor, a visual predictor, and a textual predictor.

Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection

no code implementations1 Nov 2024 Yinxuan Huang, Chengmin Gao, Bin Li, xiangyang xue

Through experiments on various datasets, we demonstrate the effectiveness of our active viewpoint selection strategy, significantly enhancing segmentation and reconstruction performance compared to random viewpoint selection.

Object

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

1 code implementation1 Nov 2024 Rui Yang, Jie Wang, Guoping Wu, Bin Li

Based on the aforementioned measure, TRACER can regulate the loss associated with corrupted data to reduce its influence, thereby enhancing robustness and performance in clean environments.

Bayesian Inference Offline RL +1

MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

no code implementations30 Oct 2024 Haoyang Liu, Jie Wang, Wanbo Zhang, Zijie Geng, Yufei Kuang, Xijun Li, Bin Li, Yongdong Zhang, Feng Wu

However, existing approaches do not take into account specific block structures -- which are closely related to the problem formulations -- in the constraint coefficient matrices (CCMs) of MILPs.

An Efficient Watermarking Method for Latent Diffusion Models via Low-Rank Adaptation

no code implementations26 Oct 2024 Dongdong Lin, Yue Li, Benedetta Tondi, Bin Li, Mauro Barni

Moreover, we also propose a dynamic loss weight tuning algorithm to balance the generative task with the watermark embedding task, ensuring that the model can be watermarked with a limited impact on the quality of the generated images.

Learning Global Object-Centric Representations via Disentangled Slot Attention

no code implementations24 Oct 2024 Tonglin Chen, Yinxuan Huang, Zhimeng Shen, Jinghao Huang, Bin Li, xiangyang xue

Existing object-centric learning methods only extract scene-dependent object-centric representations, lacking the ability to identify the same object across scenes as humans.

Object Position +2

Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

no code implementations19 Oct 2024 Qitan Lv, Jie Wang, Hanzhu Chen, Bin Li, Yongdong Zhang, Feng Wu

Generation of plausible but incorrect factual information, often termed hallucination, has attracted significant research interest.

Hallucination Language Modeling +3

Towards General Deepfake Detection with Dynamic Curriculum

no code implementations15 Oct 2024 Wentang Song, Yuzhen Lin, Bin Li

Specifically, we present a novel simple yet effective strategy, named Dynamic Facial Forensic Curriculum (DFFC), which makes the model gradually focus on hard samples during the training.

DeepFake Detection Face Swapping

One-shot Generative Domain Adaptation in 3D GANs

1 code implementation11 Oct 2024 Ziqiang Li, Yi Wu, Chaoyue Wang, Xue Rui, Bin Li

This paper first considers a novel task known as One-shot 3D Generative Domain Adaptation (GDA), aimed at transferring a pre-trained 3D generator from one domain to a new one, relying solely on a single reference image.

Domain Adaptation Image Generation

BeSimulator: A Large Language Model Powered Text-based Behavior Simulator

no code implementations24 Sep 2024 Jianan Wang, Bin Li, Xueying Wang, Fu Li, Yunlong Wu, Juan Chen, Xiaodong Yi

Traditional robot simulators focus on physical process modeling and realistic rendering, often suffering from high computational costs, inefficiencies, and limited adaptability.

Language Modeling Language Modelling +1

Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection

no code implementations22 Sep 2024 Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, Qiushi Li

Previous studies in deepfake detection have shown promising results when testing face forgeries from the same dataset as the training.

DeepFake Detection Face Swapping

DSparsE: Dynamic Sparse Embedding for Knowledge Graph Completion

no code implementations22 Sep 2024 Chuhong Yang, Bin Li, Nan Wu

An ablation study is performed to examine the effects of the dynamic layer and relation-aware layer, where the combined model achieves the best performance.

Decoder Knowledge Graph Completion +1

Contract Structure and Risk Aversion in Longevity Risk Transfers

no code implementations13 Sep 2024 David Landriault, Bin Li, Hong Li, Yuanyuan Zhang

This paper introduces an economic framework to assess optimal longevity risk transfers between institutions, focusing on the interactions between a buyer exposed to long-term longevity risk and a seller offering longevity protection.

Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks

no code implementations21 Aug 2024 Ziqiang Li, Yueqi Zeng, Pengfei Xia, Lei Liu, Zhangjie Fu, Bin Li

With the burgeoning advancements in the field of natural language processing (NLP), the demand for training data has increased significantly.

Backdoor Attack

SZU-AFS Antispoofing System for the ASVspoof 5 Challenge

no code implementations19 Aug 2024 Yuxiong Xu, Jiafeng Zhong, Sengui Zheng, Zefeng Liu, Bin Li

This paper presents the SZU-AFS anti-spoofing system, designed for Track 1 of the ASVspoof 5 Challenge under open conditions.

Data Augmentation

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

1 code implementation15 Aug 2024 Dongshuo Yin, Leiyi Hu, Bin Li, Youqun Zhang, Xue Yang

To fully demonstrate the practicality and generality of Mona, we conduct experiments on multiple representative visual tasks, including instance segmentation on COCO, semantic segmentation on ADE20K, object detection on Pascal VOC, oriented object detection on DOTA/STAR, and image classification on three common datasets.

Image Classification Instance Segmentation +5

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

no code implementations14 Aug 2024 Ruofeng Wei, Bin Li, Kai Chen, Yiyao Ma, Yunhui Liu, Qi Dou

The results demonstrate that our method can learn the absolute scale with geometric modeling and accurately estimate scale-aware depth for monocular scenes.

Monocular Depth Estimation

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

1 code implementation31 Jul 2024 Jiafeng Zhong, Bin Li, Jiangyan Yi

The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level.

FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models

no code implementations29 Jul 2024 Mingzhao Yang, Shangchao Su, Bin Li, xiangyang xue

On the server, the descriptions are used as conditions to guide the DM in generating synthetic datasets that comply with the distributions of various clients, enabling the training of the aggregated model.

Federated Learning

EAFormer: Scene Text Segmentation with Edge-Aware Transformers

1 code implementation24 Jul 2024 Haiyang Yu, Teng Fu, Bin Li, xiangyang xue

In this paper, we propose Edge-Aware Transformers, termed EAFormer, to segment texts more accurately, especially at the edge of texts.

Decoder Segmentation +1

Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence

1 code implementation24 Jul 2024 Xiaoyu Tan, Bin Li, Xihe Qiu, Jingjing Huang, Yinghui Xu, Wei Chu

To the best of our knowledge, this is the first study to successfully address both event and time label noise in deep Hawkes process models, offering a promising solution for medical applications, specifically in diagnosing OSAHS.

KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

no code implementations17 Jul 2024 Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li

By establishing physical causality from actions (cause) to trajectories (effect) through the kinematic model, KiGRAS eliminates massive redundant trajectories.

Autonomous Driving

Reconfigurable Intelligent Surface for Sensing, Communication, and Computation: Perspectives, Challenges, and Opportunities

no code implementations16 Jul 2024 Bin Li, Wancheng Xie, Zesong Fei

To help the ISCC networks better support the comprehensive services of radar detection, data transmission and edge computing, Reconfigurable Intelligent Surface (RIS) can be employed to boost the transmission rate and the wireless coverage by smartly tuning the electromagnetic characteristics of the environment.

Edge-computing

Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

no code implementations3 Jul 2024 Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

We executed six state-of-the-art AI-based chip placement algorithms on these designs and plugged the results of each single-point algorithm into the physical implementation workflow to obtain the final PPA results.

Benchmarking

An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

no code implementations1 Jul 2024 Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, YiLing Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM.

Management Prognosis

GM-DF: Generalized Multi-Scenario Deepfake Detection

1 code implementation28 Jun 2024 Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen

In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets.

DeepFake Detection Face Swapping +2

Enhancing Monotonic Modeling with Spatio-Temporal Adaptive Awareness in Diverse Marketing

no code implementations20 Jun 2024 Bin Li, Jiayan Pei, Feiyang Xiao, Yifan Zhao, Zhixing Zhang, Diwei Liu, Hengxu He, Jia Jia

OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency.

Attribute Marketing

Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification

no code implementations20 Jun 2024 Nanxing Meng, Qizao Wang, Bin Li, xiangyang xue

With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects.

Video-Based Person Re-Identification

Deep Symbolic Optimization for Combinatorial Optimization: Accelerating Node Selection by Discovering Potential Heuristics

1 code implementation14 Jun 2024 Hongyu Liu, Haoyang Liu, Yufei Kuang, Jie Wang, Bin Li

With data-driven approaches, Dso4NS guides the search for mathematical expressions within the high-dimensional discrete symbolic space and then incorporates the highest-performing mathematical expressions into a solver.

Combinatorial Optimization

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

1 code implementation11 Jun 2024 Chenyu Yang, Xizhou Zhu, Jinguo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie zhou, Yu Qiao, Jifeng Dai

Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data.

Contrastive Learning

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

1 code implementation10 Jun 2024 Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, xiangyang xue

In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation.

Attribute Diversity +1

Ghost imaging-based Non-contact Heart Rate Detection

no code implementations4 Jun 2024 Jianming Yu, Yuchen He, Bin Li, Hui Chen, Huaibin Zheng, Jianbin Liu, Zhuo Xu

Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection.

Distribution Aligned Semantics Adaption for Lifelong Person Re-Identification

1 code implementation30 May 2024 Qizao Wang, Xuelin Qian, Bin Li, xiangyang xue

Therefore, the adaptation of Re-ID models to new domains while preserving previously acquired knowledge is crucial, known as Lifelong person Re-IDentification (LReID).

Knowledge Distillation Person Re-Identification

World Models for General Surgical Grasping

no code implementations28 May 2024 Hongbin Lin, Bin Li, Chun Wai Wong, Juan Rojas, Xiangyu Chu, Kwok Wai Samuel Au

Our learned visuomotor policy handles: i) unseen objects, including 5 types of target grasping objects and a robot gripper, in unstructured real-world surgery environments, and ii) disturbances in perception and control.

Deep Reinforcement Learning Pose Estimation

Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification

1 code implementation26 May 2024 Qizao Wang, Xuelin Qian, Bin Li, Lifeng Chen, Yanwei Fu, xiangyang xue

Specifically, we propose the Content and Salient Semantics Collaboration (CSSC) framework, facilitating cross-parallel semantics interaction and refinement.

Cloth-Changing Person Re-Identification

Image-Text-Image Knowledge Transferring for Lifelong Person Re-Identification with Hybrid Clothing States

no code implementations26 May 2024 Qizao Wang, Xuelin Qian, Bin Li, Yanwei Fu, xiangyang xue

To tackle the challenges of knowledge granularity mismatch and knowledge presentation mismatch that occurred in LReID-Hybrid, we take advantage of the consistency and generalization of the text space, and propose a novel framework, dubbed $Teata$, to effectively align, transfer and accumulate knowledge in an "image-text-image" closed loop.

Lifelong learning Person Re-Identification +1

Image Copy-Move Forgery Detection via Deep PatchMatch and Pairwise Ranking Learning

no code implementations26 Apr 2024 Yuanman Li, Yingjie He, Changsheng chen, Li Dong, Bin Li, Jiantao Zhou, Xia Li

To address these limitations, this study proposes a novel end-to-end CMFD framework that integrates the strengths of conventional and deep learning methods.

Chaos in Motion: Unveiling Robustness in Remote Heart Rate Measurement through Brain-Inspired Skin Tracking

no code implementations11 Apr 2024 Jie Wang, Jing Lian, Minjie Ma, Junqiang Lei, Chunbiao Li, Bin Li, Jizhao Liu

To address these issues, we regard the remote heart rate measurement as the process of analyzing the spatiotemporal characteristics of the optical flow signal in the video.

Optical Flow Estimation

Uncertainty-Aware Deep Video Compression with Ensembles

no code implementations28 Mar 2024 Wufei Ma, Jiahao Li, Bin Li, Yan Lu

Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error.

Diversity Motion Estimation +2

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm

no code implementations18 Mar 2024 Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang, Bin Li

Drawing on recent advancements in diffusion models for text-to-image generation, identity-preserved personalization has made significant progress in accurately capturing specific identities with just a single reference image.

Text-to-Image Generation

Learning-augmented Online Minimization of Age of Information and Transmission Costs

no code implementations5 Mar 2024 Zhongdong Liu, Keyuan Zhang, Bin Li, Yin Sun, Y. Thomas Hou, Bo Ji

To address this challenge, we develop a robust online algorithm to minimize the sum of transmission and staleness costs, ensuring a worst-case performance guarantee.

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

no code implementations28 Feb 2024 Bin Li, Ye Shi, Qian Yu, Jingya Wang

This paper introduces ProtoOT, a novel Optimal Transport formulation explicitly tailored for UCIR, which integrates intra-domain feature representation learning and cross-domain alignment into a unified framework.

Contrastive Learning Image Retrieval +2

Neural Video Compression with Feature Modulation

1 code implementation CVPR 2024 Jiahao Li, Bin Li, Yan Lu

This results in a better learning of the quantization scaler and helps our NVC support about 11. 4 dB PSNR range.

Blocking Quantization +1

SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models

no code implementations1 Feb 2024 Tianhan Xu, Zhe Hu, Ling Chen, Bin Li

In the next stage, we train the skill router using task-specific downstream data and use this router to integrate the acquired skills with LLMs during inference.

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research

no code implementations26 Jan 2024 Sicong Cao, Xiaobing Sun, Ratnadira Widyasari, David Lo, Xiaoxue Wu, Lili Bo, Jiale Zhang, Bin Li, Wei Liu, Di wu, Yixin Chen

The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE).

Decision Making Systematic Literature Review +1

Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and Selection

1 code implementation18 Jan 2024 Fan Shi, Bin Li, xiangyang xue

In the odd-one-out task and two held-out configurations, RAISE can leverage acquired latent concepts and atomic rules to find the rule-breaking image in a matrix and handle problems with unseen combinations of rules and attributes.

Answer Generation Attribute +2

Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling

1 code implementation17 Jan 2024 Hong Wang, Zhongkai Hao, Jie Wang, Zijie Geng, Zhen Wang, Bin Li, Feng Wu

To the best of our knowledge, SKR is the first attempt to address the time-consuming nature of data generation for learning neural operators.

A Study on Training and Developing Large Language Models for Behavior Tree Generation

no code implementations16 Jan 2024 Fu Li, Xueying Wang, Bin Li, Yunlong Wu, Yanzhen Wang, Xiaodong Yi

The core contribution of this paper lies in the design of a BT generation framework based on LLM, which encompasses the entire process, from data synthesis and model training to application developing and data verification.

Unsupervised Object-Centric Learning from Multiple Unspecified Viewpoints

no code implementations3 Jan 2024 Jinyang Yuan, Tonglin Chen, Zhimeng Shen, Bin Li, xiangyang xue

This ability is essential for humans to identify the same object while moving and to learn from vision efficiently.

Object

CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images

no code implementations CVPR 2024 Changsheng chen, Liangwei Lin, Yongqi Chen, Bin Li, Jishen Zeng, Jiwu Huang

Then we extract a chromaticity map from the recaptured image to highlight the presence of color artifacts even under low-quality samples.

Fast Adaptation for Human Pose Estimation via Meta-Optimization

no code implementations CVPR 2024 Shengxiang Hu, Huaijiang Sun, Bin Li, Dong Wei, Weiqing Li, Jianfeng Lu

Domain shift is a challenge for supervised human pose estimation where the source data and target data come from different distributions.

Auxiliary Learning Image Inpainting +4

Generative Latent Coding for Ultra-Low Bitrate Image Compression

no code implementations CVPR 2024 Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

To address this issue we introduce a Generative Latent Coding (GLC) architecture which performs transform coding in the latent space of a generative vector-quantized variational auto-encoder (VQ-VAE) instead of in the pixel space.

Image Compression Image Restoration +1

DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization

no code implementations CVPR 2024 Zeqin Yu, Jiangqun Ni, Yuzhen Lin, Haoyi Deng, Bin Li

Based on the assumption a novel two-stage self-supervised framework leveraging the diffusion model for IFDL task i. e. DiffForensics is proposed in this paper.

Decoder Denoising +1

MoML: Online Meta Adaptation for 3D Human Motion Prediction

no code implementations CVPR 2024 Xiaoning Sun, Huaijiang Sun, Bin Li, Dong Wei, Weiqing Li, Jianfeng Lu

In the academic field the research on human motion prediction tasks mainly focuses on exploiting the observed information to forecast human movements accurately in the near future horizon.

Bilevel Optimization Human motion prediction +2

Adapter is All You Need for Tuning Visual Tasks

1 code implementation25 Nov 2023 Dongshuo Yin, Leiyi Hu, Bin Li, Youqun Zhang

To fully demonstrate the practicality and generality of Mona, we conduct experiments on multiple representative visual tasks, including instance segmentation on COCO, semantic segmentation on ADE20K, object detection on Pascal VOC, and image classification on several common datasets.

All Image Classification +5

Federated Transformed Learning for a Circular, Secure, and Tiny AI

no code implementations24 Nov 2023 Weisi Guo, Schyler Sun, Bin Li, Sam Blakeman

Deep Learning (DL) is penetrating into a diverse range of mass mobility, smart living, and industrial applications, rapidly transforming the way we live and work.

Efficient Trigger Word Insertion

no code implementations23 Nov 2023 Yueqi Zeng, Ziqiang Li, Pengfei Xia, Lei Liu, Bin Li

With the boom in the natural language processing (NLP) field these years, backdoor attacks pose immense threats against deep neural network models.

text-classification Text Classification

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

1 code implementation19 Nov 2023 Shangchao Su, Bin Li, xiangyang xue

The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model.

Federated Learning

One-Shot Federated Learning with Classifier-Guided Diffusion Models

no code implementations15 Nov 2023 Mingzhao Yang, Shangchao Su, Bin Li, xiangyang xue

Leveraging the extensive knowledge stored in the pre-trained diffusion model, the synthetic datasets can assist us in surpassing the knowledge limitations of the client samples, resulting in aggregation models that even outperform the performance ceiling of centralized training in some cases, which is convincingly demonstrated in the sufficient quantification and visualization experiments conducted on three large-scale multi-domain image datasets.

Federated Learning

Adaptive Digital Twin for UAV-Assisted Integrated Sensing, Communication, and Computation Networks

no code implementations26 Oct 2023 Bin Li, Wenshuai Liu, Wancheng Xie, Ning Zhang, Yan Zhang

In this paper, we study a digital twin (DT)-empowered integrated sensing, communication, and computation network.

Edge-computing

Evading Detection Actively: Toward Anti-Forensics against Forgery Localization

no code implementations16 Oct 2023 Long Zhuo, Shenghai Luo, Shunquan Tan, Han Chen, Bin Li, Jiwu Huang

In adversarial training, SEAR employs a forgery localization model as a supervisor to explore tampering features and constructs a deep-learning concealer to erase corresponding traces.

Adversarial Attack Self-Supervised Learning

Explore the Effect of Data Selection on Poison Efficiency in Backdoor Attacks

no code implementations15 Oct 2023 Ziqiang Li, Pengfei Xia, Hong Sun, Yueqi Zeng, Wei zhang, Bin Li

In this study, we focus on improving the poisoning efficiency of backdoor attacks from the sample selection perspective.

Audio Classification Image Classification +2

Orientation-Independent Chinese Text Recognition in Scene Images

1 code implementation3 Sep 2023 Haiyang Yu, Xiaocong Wang, Bin Li, xiangyang xue

We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information.

Benchmarking Image Reconstruction +1

Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning

1 code implementation ICCV 2023 Haiyang Yu, Xiaocong Wang, Bin Li, xiangyang xue

However, despite Chinese characters possessing different characteristics from Latin characters, such as complex inner structures and large categories, few methods have been proposed for Chinese Text Recognition (CTR).

Scene Text Recognition

Robust Computation Offloading and Trajectory Optimization for Multi-UAV-Assisted MEC: A Multi-Agent DRL Approach

no code implementations24 Aug 2023 Bin Li, Rongrong Yang, Lei Liu, Junyi Wang, Ning Zhang, Mianxiong Dong

For multiple Unmanned-Aerial-Vehicles (UAVs) assisted Mobile Edge Computing (MEC) networks, we study the problem of combined computation and communication for user equipments deployed with multi-type tasks.

Deep Reinforcement Learning Edge-computing +1

Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification

1 code implementation21 Aug 2023 Qizao Wang, Xuelin Qian, Bin Li, xiangyang xue, Yanwei Fu

Cloth-changing person Re-IDentification (Re-ID) is a particularly challenging task, suffering from two limitations of inferior discriminative features and limited training samples.

Attribute Cloth-Changing Person Re-Identification +4

Rethinking Person Re-identification from a Projection-on-Prototypes Perspective

no code implementations21 Aug 2023 Qizao Wang, Xuelin Qian, Bin Li, Yanwei Fu, xiangyang xue

In this paper, we rethink the role of the classifier in person Re-ID, and advocate a new perspective to conceive the classifier as a projection from image features to class prototypes.

Person Re-Identification Person Retrieval +3

ForensicsForest Family: A Series of Multi-scale Hierarchical Cascade Forests for Detecting GAN-generated Faces

no code implementations2 Aug 2023 Jiucui Lu, Jiaran Zhou, Junyu Dong, Bin Li, Siwei Lyu, Yuezun Li

The proposed ForensicsForest family is composed of three variants, which are {\em ForensicsForest}, {\em Hybrid ForensicsForest} and {\em Divide-and-Conquer ForensicsForest} respectively.

Abstracting Concept-Changing Rules for Solving Raven's Progressive Matrix Problems

1 code implementation15 Jul 2023 Fan Shi, Bin Li, xiangyang xue

Finally, we conduct experiments to illustrate the interpretability of CRAB in concept learning, answer selection, and global rule abstraction.

Answer Generation Answer Selection +1

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

no code implementations11 Jul 2023 Dongbo Wang, Chang Liu, Zhixiao Zhao, Si Shen, Liu Liu, Bin Li, Haotian Hu, Mengcheng Wu, Litao Lin, Xue Zhao, Xiyu Wang

In the context of the rapid development of large language models, we have meticulously trained and introduced the GujiBERT and GujiGPT language models, which are foundational models specifically designed for intelligent information processing of ancient texts.

Model Selection Part-Of-Speech Tagging +2

Prototypes as Explanation for Time Series Anomaly Detection

no code implementations4 Jul 2023 Bin Li, Carsten Jentsch, Emmanuel Müller

Detecting abnormal patterns that deviate from a certain regular repeating pattern in time series is essential in many big data applications.

Anomaly Detection Time Series +1

OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning

no code implementations16 Jun 2023 Yinxuan Huang, Tonglin Chen, Zhimeng Shen, Jinghao Huang, Bin Li, xiangyang xue

The results demonstrate the shortcomings of state-of-the-art methods for learning meaningful representations from real-world data, despite their impressive performance on complex synthesis datasets.

Object Representation Learning

Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions

no code implementations16 Jun 2023 Dongshuo Yin, Xueting Han, Bin Li, Hao Feng, Jing Bai

We provide a gradient backpropagation highway for low-rank adapters which eliminates the need for expensive backpropagation through the frozen pre-trained model, resulting in substantial savings of training memory and training time.

Transfer Learning

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

1 code implementation14 Jun 2023 Ziqiang Li, Hong Sun, Pengfei Xia, Heng Li, Beihao Xia, Yi Wu, Bin Li

However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data.

Backdoor Attack

A Proxy Attack-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks

no code implementations14 Jun 2023 Ziqiang Li, Hong Sun, Pengfei Xia, Beihao Xia, Xue Rui, Wei zhang, Qinglang Guo, Zhangjie Fu, Bin Li

To address these concerns, we present a Proxy attack-Free Strategy (PFS) designed to identify efficient poisoning samples based on the similarity between clean samples and their corresponding poisoning samples, as well as the diversity of the poisoning set.

Active Learning Backdoor Attack

Enhanced Fine-grained Motion Diffusion for Text-driven Human Motion Synthesis

no code implementations23 May 2023 Dong Wei, Xiaoning Sun, Huaijiang Sun, Bin Li, Shengxiang Hu, Weiqing Li, Jianfeng Lu

The emergence of text-driven motion synthesis technique provides animators with great potential to create efficiently.

Motion Synthesis valid

Collaborative Chinese Text Recognition with Personalized Federated Learning

no code implementations9 May 2023 Shangchao Su, Haiyang Yu, Bin Li, xiangyang xue

In Chinese text recognition, to compensate for the insufficient local data and improve the performance of local few-shot character recognition, it is often necessary for one organization to collect a large amount of data from similar organizations.

Personalized Federated Learning Privacy Preserving

Large Language Models Need Holistically Thought in Medical Conversational QA

1 code implementation9 May 2023 Yixuan Weng, Bin Li, Fei Xia, Minjun Zhu, Bin Sun, Shizhu He, Kang Liu, Jun Zhao

The medical conversational question answering (CQA) system aims at providing a series of professional medical services to improve the efficiency of medical care.

Conversational Question Answering

Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

no code implementations6 May 2023 Mingzhao Yang, Shangchao Su, Bin Li, xiangyang xue

Recently, semi-supervised federated learning (semi-FL) has been proposed to handle the commonly seen real-world scenarios with labeled data on the server and unlabeled data on the clients.

Diversity Federated Learning +1

Meta-Auxiliary Learning for Adaptive Human Pose Prediction

no code implementations13 Apr 2023 Qiongjie Cui, Huaijiang Sun, Jianfeng Lu, Bin Li, Weiqing Li

Predicting high-fidelity future human poses, from a historically observed sequence, is decisive for intelligent robots to interact with humans.

Auxiliary Learning Pose Prediction +3

RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments

no code implementations10 Apr 2023 Drew Penney, Bin Li, Lizhong Chen, Jaroslaw J. Sydir, Anna Drewek-Ossowicka, Ramesh Illikkal, Charlie Tai, Ravi Iyer, Andrew Herdrich

Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership.

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

2 code implementations6 Apr 2023 Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang

We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI).

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

3 code implementations4 Apr 2023 Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, Jun Zhao

Our work highlights the potential of seamlessly unifying explicit rule learning via CoNNs and implicit pattern learning in LMs, paving the way for true symbolic comprehension capabilities.

Arithmetic Reasoning Language Modelling

Mechanism Design for Ad Auctions with Display Prices

no code implementations23 Mar 2023 Bin Li, Yahui Lei

In this paper, we study ad auctions with display prices from the perspective of mechanism design, in which advertisers are asked to submit both the costs and prices of their products.

Uncertainty-aware U-Net for Medical Landmark Detection

no code implementations18 Mar 2023 Ziyang Ye, Haiyang Yu, Bin Li

To estimate the uncertainty, we propose a module named Pyramid Covariance Predictor to predict the covariance matrices of the target Gaussian distributions, which determine the distributions of landmarks and represent the uncertainty of landmark annotation.

Anatomical Landmark Detection

Provably Convergent Subgraph-wise Sampling for Fast GNN Training

no code implementations17 Mar 2023 Jie Wang, Zhihao Shi, Xize Liang, Defu Lian, Shuiwang Ji, Bin Li, Enhong Chen, Feng Wu

During the message passing (MP) in GNNs, subgraph-wise sampling methods discard messages outside the mini-batches in backward passes to avoid the well-known neighbor explosion problem, i. e., the exponentially increasing dependencies of nodes with the number of MP iterations.

Neural Video Compression with Diverse Contexts

2 code implementations CVPR 2023 Jiahao Li, Bin Li, Yan Lu

Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR.

Diversity Optical Flow Estimation +1

Generalization in Visual Reinforcement Learning with the Reward Sequence Distribution

1 code implementation19 Feb 2023 Jie Wang, Rui Yang, Zijie Geng, Zhihao Shi, Mingxuan Ye, Qi Zhou, Shuiwang Ji, Bin Li, Yongdong Zhang, Feng Wu

The appealing features of RSD-OA include that: (1) RSD-OA is invariant to visual distractions, as it is conditioned on the predefined subsequent action sequence without task-irrelevant information from transition dynamics, and (2) the reward sequence captures long-term task-relevant information in both rewards and transition dynamics.

reinforcement-learning Reinforcement Learning +2

Energy Efficient Computation Offloading in Aerial Edge Networks With Multi-Agent Cooperation

no code implementations14 Feb 2023 Wenshuai Liu, Bin Li, Wancheng Xie, Yueyue Dai, Zesong Fei

With the high flexibility of supporting resource-intensive and time-sensitive applications, unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is proposed as an innovational paradigm to support the mobile users (MUs).

Deep Reinforcement Learning Edge-computing +1

EVC: Towards Real-Time Neural Image Compression with Mask Decay

1 code implementation10 Feb 2023 Guo-Hua Wang, Jiahao Li, Bin Li, Yan Lu

Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder.

Decoder Image Compression +1

Kainate receptor modulation by NETO2

no code implementations2 Feb 2023 Lingli He, Jiahui Sun, Yiwei Gao, Bin Li, Yuhang Wang, Yanli Dong, Weidong An, Hang Li, Bei Yang, Yuhan Ge, Xuejun Cai Zhang, Yun Stone Shi, Yan Zhao

Glutamate-gated kainate receptors (KARs) are ubiquitous in the central nervous system of vertebrates, mediate synaptic transmission on post-synapse, and modulate transmitter release on pre-synapse.

Learning Trustworthy Model from Noisy Labels based on Rough Set for Surface Defect Detection

no code implementations25 Jan 2023 Tongzhi Niu, Bin Li, Kai Li, Yufeng Lin, Yuwei Li, Weifeng Li, Zhenrong Wang

In the surface defect detection, there are some suspicious regions that cannot be uniquely classified as abnormal or normal.

Defect Detection

Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction

1 code implementation21 Jan 2023 Chengmin Gao, Bin Li

To reconstruct the complete shape of an object accurately, we enhance the disentanglement between the latent representations of objects and views, where the latent representations of time-conditioned views are jointly inferred with a Transformer and then are input to a sequential extension of Slot Attention to learn object-centric representations.

Disentanglement Gaussian Processes +2

Test-time Personalizable Forecasting of 3D Human Poses

no code implementations ICCV 2023 Qiongjie Cui, Huaijiang Sun, Jianfeng Lu, Weiqing Li, Bin Li, Hongwei Yi, Haofan Wang

Current motion forecasting approaches typically train a deep end-to-end model from the source domain data, and then apply it directly to target subjects.

Motion Forecasting

Motion Information Propagation for Neural Video Compression

no code implementations CVPR 2023 Linfeng Qi, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

Meanwhile, besides assisting frame coding at the current time step, the feature from context generation will be propagated as motion condition when coding the subsequent motion latent.

Decoder Video Compression

Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging

no code implementations CVPR 2023 Tianyu Huang, Haoang Li, Kejing He, Congying Sui, Bin Li, Yun-hui Liu

As to the orthographic projection problem, we propose a novel Viewing Direction-aided Positional Encoding (VDPE) strategy.

Large Language Models are Better Reasoners with Self-Verification

1 code implementation19 Dec 2022 Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, Jun Zhao

By performing a backward verification of the answers that LLM deduced for itself, we can obtain interpretable answer validation scores to select the candidate answer with the highest score.

Arithmetic Reasoning Common Sense Reasoning +3

Adversarial Example Defense via Perturbation Grading Strategy

no code implementations16 Dec 2022 Shaowei Zhu, Wanli Lyu, Bin Li, Zhaoxia Yin, Bin Luo

In addition, the proposed method does not modify any task model, which can be used as a preprocessing module, which significantly reduces the deployment cost in practical applications.

Artificial Text Detection with Multiple Training Strategies

no code implementations10 Dec 2022 Bin Li, Yixuan Weng, Qiya Song, Hanjun Deng

As the deep learning rapidly promote, the artificial texts created by generative models are commonly used in news and social media.

Language Modeling Language Modelling +1

Chinese Character Recognition with Radical-Structured Stroke Trees

no code implementations24 Nov 2022 Haiyang Yu, Jingye Chen, Bin Li, xiangyang xue

In this paper, we represent each Chinese character as a stroke tree, which is organized according to its radical structures, to fully exploit the merits of both radical and stroke levels in a decent way.

Decoder

Compositional Scene Modeling with Global Object-Centric Representations

no code implementations21 Nov 2022 Tonglin Chen, Bin Li, Zhimeng Shen, xiangyang xue

Inspired by such an ability of humans, this paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision.

Object Patch Matching +1

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

1 code implementation CVPR 2023 Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie zhou, Jifeng Dai

It has been proved that combining multiple pre-training strategies and data from various modalities/sources can greatly boost the training of large-scale models.

All Image Classification +4

Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning

1 code implementation15 Nov 2022 Shangchao Su, Mingzhao Yang, Bin Li, xiangyang xue

In this paper, we propose a federated adaptive prompt tuning algorithm, FedAPT, for multi-domain collaborative image classification with powerful foundation models, like CLIP.

Federated Learning Image Classification

Visual Answer Localization with Cross-modal Mutual Knowledge Transfer

1 code implementation26 Oct 2022 Yixuan Weng, Bin Li

In this paper, we propose a cross-modal mutual knowledge transfer span localization (MutualSL) method to reduce the knowledge deviation.

Transfer Learning

Slippage-robust Gaze Tracking for Near-eye Display

no code implementations20 Oct 2022 Wei zhang, Jiaxi Cao, Xiang Wang, Enqi Tian, Bin Li

In recent years, head-mounted near-eye display devices have become the key hardware foundation for virtual reality and augmented reality.

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

no code implementations12 Oct 2022 Dong Wei, Huaijiang Sun, Bin Li, Jianfeng Lu, Weiqing Li, Xiaoning Sun, Shengxiang Hu

This process offers a natural way to obtain the "whitened" latents without any trainable parameters, and human motion prediction can be regarded as the reverse diffusion process that converts the noise distribution into realistic future motions conditioned on the observed sequence.

Decoder Diversity +3

Learning to Locate Visual Answer in Video Corpus Using Question

1 code implementation11 Oct 2022 Bin Li, Yixuan Weng, Bin Sun, Shutao Li

We introduce a new task, named video corpus visual answer localization (VCVAL), which aims to locate the visual answer in a large collection of untrimmed instructional videos using a natural language question.

Contrastive Learning Language Modelling +2

Domain Discrepancy Aware Distillation for Model Aggregation in Federated Learning

no code implementations4 Oct 2022 Shangchao Su, Bin Li, xiangyang xue

In this paper, we first analyze the generalization bound of the aggregation model produced from knowledge distillation for the client domains, and then describe two challenges, server-to-client discrepancy and client-to-client discrepancy, brought to the aggregation model by the domain discrepancies.

Federated Learning Knowledge Distillation

Domain-Unified Prompt Representations for Source-Free Domain Generalization

1 code implementation29 Sep 2022 Hongjing Niu, Hanting Li, Feng Zhao, Bin Li

The proposed scheme generates diverse prompts from a domain bank that contains many more diverse domains than existing DG datasets.

Diversity Source-free Domain Generalization

TODE-Trans: Transparent Object Depth Estimation with Transformer

1 code implementation18 Sep 2022 Kang Chen, Shaochen Wang, Beihao Xia, Dongxu Li, Zhen Kan, Bin Li

We observe that the global characteristics of the transformer make it easier to extract contextual information to perform depth estimation of transparent areas.

Depth Estimation Object +2

Compositional Law Parsing with Latent Random Functions

1 code implementation15 Sep 2022 Fan Shi, Bin Li, xiangyang xue

The automatic parsing of these laws indicates the model's ability to understand the scene, which makes law parsing play a central role in many visual tasks.

Position Visual Reasoning

Rain Removal from Light Field Images with 4D Convolution and Multi-scale Gaussian Process

1 code implementation16 Aug 2022 Tao Yan, Mingyue Li, Bin Li, Yang Yang, Rynson W. H. Lau

However, making full use of the abundant information available from LFIs, such as 2D array of sub-views and the disparity map of each sub-view, for effective rain removal is still a challenging problem.

Depth Estimation Rain Removal

Style Spectroscope: Improve Interpretability and Controllability through Fourier Analysis

no code implementations12 Aug 2022 Zhiyu Jin, Xuli Shen, Bin Li, xiangyang xue

We connect Fourier amplitude and phase with Gram matrices and a content reconstruction loss in style transfer, respectively.

Style Transfer

Clear Memory-Augmented Auto-Encoder for Surface Defect Detection

no code implementations8 Aug 2022 Wei Luo, Tongzhi Niu, Lixin Tang, Wenyong Yu, Bin Li

At first, we propose a novel clear memory-augmented module (CMAM), which combines the encoding and memoryencoding in a way of forgetting and inputting, thereby repairing abnormal foregrounds and preserving clear backgrounds.

Anomaly Detection Defect Detection

Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction

no code implementations2 Aug 2022 Xiaoning Sun, Qiongjie Cui, Huaijiang Sun, Bin Li, Weiqing Li, Jianfeng Lu

Previous works on human motion prediction follow the pattern of building a mapping relation between the sequence observed and the one to be predicted.

Human motion prediction motion prediction +4

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

1 code implementation18 Jul 2022 Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, Bin Li

Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot.

Contrastive Learning Data Augmentation +1

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

1 code implementation13 Jul 2022 Jiahao Li, Bin Li, Yan Lu

Besides estimating the probability distribution, our entropy model also generates the quantization step at spatial-channel-wise.

Quantization Video Compression

Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation

no code implementations5 Jul 2022 Bin Li, Yixuan Weng, Ziyu Ma, Bin Sun, Shutao Li

To fully leverage the visual information for both scene understanding and dialogue generation, we propose the scene-aware prompt for the MDUG task.

Dialogue Generation Dialogue Understanding +2

Cross-domain Federated Object Detection

no code implementations30 Jun 2022 Shangchao Su, Bin Li, Chengzhi Zhang, Mingzhao Yang, xiangyang xue

Federated learning can enable multi-party collaborative learning without leaking client data.

Autonomous Driving Federated Learning +3

Adversarial Reconfigurable Intelligent Surface Against Physical Layer Key Generation

no code implementations22 Jun 2022 Zhuangkun Wei, Bin Li, Weisi Guo

The development of reconfigurable intelligent surfaces (RIS) has recently advanced the research of physical layer security (PLS).

compressed sensing

STD-NET: Search of Image Steganalytic Deep-learning Architecture via Hierarchical Tensor Decomposition

1 code implementation12 Jun 2022 Shunquan Tan, Qiushi Li, Laiyuan Li, Bin Li, Jiwu Huang

We propose a normalized distortion threshold to evaluate the sensitivity of each involved convolutional layer of the base model to guide STD-NET to compress target network in an efficient and unsupervised approach, and obtain two network structures of different shapes with low computation cost and similar performance compared with the original one.

Model Compression Steganalysis +1

Siamese Image Modeling for Self-Supervised Vision Representation Learning

2 code implementations CVPR 2023 Chenxin Tao, Xizhou Zhu, Weijie Su, Gao Huang, Bin Li, Jie zhou, Yu Qiao, Xiaogang Wang, Jifeng Dai

Driven by these analysis, we propose Siamese Image Modeling (SiameseIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations.

Representation Learning Self-Supervised Learning +1

Dog nose print matching with dual global descriptor based on Contrastive Learning

1 code implementation1 Jun 2022 Bin Li, Zhongan Wang, Nan Wu, Shuai Shi, Qijun Ma

These methods generally extract the global features as descriptor to represent the original image.

Contrastive Learning

Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

1 code implementation20 May 2022 Rui Yang, Jie Wang, Zijie Geng, Mingxuan Ye, Shuiwang Ji, Bin Li, Feng Wu

Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning (RL) in real scenarios.

Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.