Search Results for author: Xuelong Li

Found 176 papers, 53 papers with code

From Missing Pieces to Masterpieces: Image Completion with Context-Adaptive Diffusion

no code implementations19 Apr 2025 Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Michael Felsberg, DaCheng Tao, Xuelong Li

Additionally, diffusion models typically rely on global learned distributions rather than localized features, leading to inconsistencies between the generated and existing image parts.

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum

no code implementations17 Apr 2025 Yuan Zhou, Xinli Shi, Xuelong Li, Jiachen Zhong, Guanghui Wen, Jinde Cao

Employing DFL methods to solve such general optimization problems leads to the formulation of Decentralized Nonconvex Composite Federated Learning (DNCFL), a topic that remains largely underexplored.

Federated Learning

SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding

no code implementations17 Apr 2025 Qianqian Sun, Jixiang Luo, Dell Zhang, Xuelong Li

The key innovations of SmartFreeEdit include:(1)the introduction of region aware tokens and a mask embedding paradigm that enhance the spatial understanding of complex scenes;(2) a reasoning segmentation pipeline designed to optimize the generation of editing masks based on natural language instructions;and (3) a hypergraph-augmented inpainting module that ensures the preservation of both structural integrity and semantic coherence during complex edits, overcoming the limitations of local-based image generation.

Image Generation Large Language Model +4

OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding

no code implementations15 Apr 2025 Dianbing Xi, Jiepeng Wang, Yuanzhi Liang, Xi Qiu, Yuchi Huo, Rui Wang, Chi Zhang, Xuelong Li

In this paper, we propose a novel framework for controllable video diffusion, OmniVDiff, aiming to synthesize and comprehend multiple video visual content in a single diffusion model.

Semantic Segmentation Video Generation +1

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

no code implementations10 Apr 2025 Junli Liu, Qizhi Chen, Zhigang Wang, Yiwen Tang, Yiting Zhang, Chi Yan, Dong Wang, Xuelong Li, Bin Zhao

Furthermore, we propose an innovative model especially for the AerialVG task, where a Hierarchical Cross-Attention is devised to focus on target regions, and a Relation-Aware Grounding module is designed to infer positional relations.

Spatial Reasoning Visual Grounding

Bridging the Gap between Continuous and Informative Discrete Representations by Random Product Quantization

no code implementations7 Apr 2025 Xueqing Li, Zehan Li, Boyu Zhu, Ruihao Jing, Jian Kang, Jie Li, Xiao-Lei Zhang, Xuelong Li

Its quantization error is lower-bounded by the product of rho and epsilon-kms, where epsilon-kms denotes the quantization error of a single K-means quantizer.

Quantization Self-Supervised Learning

Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation

no code implementations1 Apr 2025 Yuanqi Yao, Siao Liu, Haoming Song, Delin Qu, Qizhi Chen, Yan Ding, Bin Zhao, Zhigang Wang, Xuelong Li, Dong Wang

Building a lifelong robot that can effectively leverage prior knowledge for continuous skill acquisition remains significantly challenging.

Prompt Learning Robot Manipulation +1

Riemannian Optimization on Relaxed Indicator Matrix Manifold

1 code implementation26 Mar 2025 Jinghui Yuan, Fangyuan Xie, Feiping Nie, Xuelong Li

The indicator matrix plays an important role in machine learning, but optimizing it is an NP-hard problem.

global-optimization Graph Clustering +2

Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification

no code implementations19 Mar 2025 Zhong Ji, Ci Liu, Jingren Liu, Chen Tang, Yanwei Pang, Xuelong Li

Central to this approach is the Optimal Transport Adapter (OTA), which employs a cross-modal attention mechanism to enrich textual representations and facilitate subsequent better information interaction.

Scene Classification

Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference

no code implementations17 Mar 2025 Cheng Yuan, Zhening Liu, Jiashu Lv, Jiawei Shao, Yufei Jiang, Jun Zhang, Xuelong Li

To address this challenge, we propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework, where visual features are merged by clustering and encoded by a learnable and selective entropy model before feature projection.

Feature Compression Question Answering +1

MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation

no code implementations14 Mar 2025 Pingrui Zhang, Xianqiang Gao, Yuhan Wu, Kehui Liu, Dong Wang, Zhigang Wang, Bin Zhao, Yan Ding, Xuelong Li

Our approach enables models to learn affordance-based final positioning that accommodates different arm types and platform heights, thereby paving the way for more robust and generalizable integration of navigation and manipulation in embodied AI.

Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models

no code implementations12 Mar 2025 Xiaozhen Qiao, Peng Huang, Jiakang Yuan, Xianda Guo, Bowen Ye, Zhe Sun, Xuelong Li

BPRE first employs a Multi-Dimensional Quality-Aware Reward Module to evaluate feature quality and guide prototype refinement precisely.

Test-time Adaptation

Large model enhanced computational ghost imaging

1 code implementation10 Mar 2025 Yifan Chen, Hongjun An, Zhe Sun, Tong Tian, Mingliang Chen, Christian Spielmann, Xuelong Li

Ghost imaging (GI) achieves 2D image reconstruction through high-order correlation of 1D bucket signals and 2D light field information, particularly demonstrating enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media.

Image Reconstruction model +1

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models

no code implementations8 Mar 2025 Muzhi Dai, Jiashuo Sun, Zhiyuan Zhao, Shixuan Liu, Rui Li, Junyu Gao, Xuelong Li

Aligning large vision-language models (LVLMs) with human preferences is challenging due to the scarcity of fine-grained, high-quality, and multimodal preference data without human annotations.

Image Captioning Language Modeling +2

DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model

no code implementations26 Feb 2025 Lei Zhao, Sizhou Chen, Linfeng Feng, Xiao-Lei Zhang, Xuelong Li

Particularly, to improve the synthesis quality and azimuth accuracy of the spatial sound events simultaneously, we propose to use two kinds of acoustic features.

Audio Generation Large Language Model

Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning

no code implementations24 Feb 2025 Weiji Xie, Chenjia Bai, Jiyuan Shi, Junkai Yang, Yunfei Ge, Weinan Zhang, Xuelong Li

Humans possess delicate dynamic balance mechanisms that enable them to maintain stability across diverse terrains and under extreme conditions.

Reinforcement Learning (RL)

Improve LLM-as-a-Judge Ability as a General Ability

no code implementations17 Feb 2025 Jiachen Yu, Shaoning Sun, Xiaohui Hu, Jiaxu Yan, Kaidong Yu, Xuelong Li

Furthermore, our training method enhances the general capabilities of the model by constructing complicated judge task, and the judge signals provided by our model have significantly enhanced the downstream DPO training performance of our internal models in our test to optimize policy model with Judge Model.

Leader and Follower: Interactive Motion Generation under Trajectory Constraints

no code implementations17 Feb 2025 Runqi Wang, Caoyuan Ma, Jian Zhao, Hanrui Xu, Dongfang Sun, Haoyang Chen, Lin Xiong, Zheng Wang, Xuelong Li

To generate interactive motion following specified trajectories, this paper decouples complex motion into a Leader - Follower dynamic, inspired by role allocation in partner dancing.

Motion Generation

AudioSpa: Spatializing Sound Events with Text

no code implementations16 Feb 2025 Linfeng Feng, Lei Zhao, Boyu Zhu, Xiao-Lei Zhang, Xuelong Li

Additionally, we propose a binaural source localization model to assess the quality of the generated audio.

Audio Generation Data Augmentation

Exploring the Potential of Encoder-free Architectures in 3D LMMs

1 code implementation13 Feb 2025 Yiwen Tang, Zoey Guo, Zhuhao Wang, Ray Zhang, Qizhi Chen, Junli Liu, Delin Qu, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao

In this paper, we present the first comprehensive investigation into the potential of encoder-free architectures to overcome the challenges of encoder-based 3D Large Multimodal Models (LMMs).

Inductive Bias Visual Question Answering (VQA)

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

no code implementations27 Jan 2025 Delin Qu, Haoming Song, Qizhi Chen, Yuanqi Yao, Xinyi Ye, Yan Ding, Zhigang Wang, Jiayuan Gu, Bin Zhao, Dong Wang, Xuelong Li

Specifically, we introduce Ego3D Position Encoding to inject 3D information into the input observations of the visual-language-action model, and propose Adaptive Action Grids to represent spatial robot movement actions with adaptive discretized action grids, facilitating learning generalizable and transferrable spatial action knowledge for cross-robot control.

Ranked #2 on Robot Manipulation on SimplerEnv-Widow X (using extra training data)

Robot Manipulation

Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR

no code implementations24 Jan 2025 Hao Ma, Rujin Chen, Ruihao Jing, Xiao-Lei Zhang, Ju Liu, Xuelong Li

Nevertheless, these methods often overlook speech intelligibility, leading to alterations or loss of semantic content in the re-synthesized speech.

Speech Extraction

Online Preference Alignment for Language Models via Count-based Exploration

1 code implementation22 Jan 2025 Chenjia Bai, Yang Zhang, Shuang Qiu, Qiaosheng Zhang, Kang Xu, Xuelong Li

Then, we reformulate our objective to direct preference optimization with an exploration term, where the UCB-term can be converted to a count-based exploration bonus.

Instruction Following

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation

1 code implementation1 Jan 2025 Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

The core of FGAseg is a Pixel-Level Alignment module that employs a cross-modal attention mechanism and a text-pixel alignment loss to refine the coarse-grained alignment from CLIP, achieving finer-grained pixel-text semantic alignment.

Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1

A Greedy Strategy for Graph Cut

no code implementations28 Dec 2024 Feiping Nie, Shenfei Pei, Zengwei Zheng, Rong Wang, Xuelong Li

To reduce the computational complexity of GGC, only mergers between clusters and their neighbors are considered.

VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation

no code implementations21 Dec 2024 Chi Zhang, Yuanzhi Liang, Xi Qiu, Fangqiu Yi, Xuelong Li

Generating high-quality videos from textual descriptions poses challenges in maintaining temporal coherence and control over subject motion.

Video Generation

Enhance Vision-Language Alignment with Noise

1 code implementation14 Dec 2024 Sida Huang, Hongyuan Zhang, Xuelong Li

It therefore implies a new scheme to learn beneficial noise distribution that can be employed to fine-tune VL models.

Variational Inference

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?

1 code implementation11 Dec 2024 Yanchen Xu, Siqi Huang, Hongyuan Zhang, Xuelong Li

Inspired by the theoretical conclusions and the idea of positive-incentive noise, we propose a novel GCL algorithm, Error-PAssing-based Graph Contrastive Learning (EPAGCL), which uses both edge adding and edge dropping as its augmentations.

Contrastive Learning Graph Representation Learning +1

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

1 code implementation27 Nov 2024 Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, Ping Luo

Our results demonstrate the effectiveness of G3Flow in enhancing real-time dynamic semantic feature understanding for robotic manipulation policies.

Imitation Learning Object +1

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

no code implementations25 Nov 2024 Guangzhao Dai, Jian Zhao, Yuantao Chen, Yusen Qin, Hao Zhao, GuoSen Xie, Yazhou Yao, Xiangbo Shu, Xuelong Li

Vision-and-Language Navigation (VLN), where an agent follows instructions to reach a target destination, has recently seen significant advancements.

3DGS Navigate +2

Night-to-Day Translation via Illumination Degradation Disentanglement

no code implementations21 Nov 2024 Guanzhou Lan, YuQi Yang, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

Specifically, our method comprises a degradation disentanglement module and a degradation-aware contrastive learning module.

Contrastive Learning Disentanglement +1

AI Flow at the Network Edge

no code implementations19 Nov 2024 Jiawei Shao, Xuelong Li

This article presents AI Flow, a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers, making intelligence flow across networks.

Image Captioning

Autonomous Decision Making for UAV Cooperative Pursuit-Evasion Game with Reinforcement Learning

no code implementations5 Nov 2024 Yang Zhao, Zidong Nie, Kangsheng Dong, Qinghua Huang, Xuelong Li

This paper proposes a deep reinforcement learning-based model for decision-making in multi-role UAV cooperative pursuit-evasion game, to address the challenge of enabling UAV to autonomously make decisions in complex game environments.

Decision Making Deep Reinforcement Learning +1

Fast Semi-supervised Learning on Large Graphs: An Improved Green-function Method

no code implementations4 Nov 2024 Feiping Nie, Yitao Song, Wei Chang, Rong Wang, Xuelong Li

In the graph-based semi-supervised learning, the Green-function method is a classical method that works by computing the Green's function in the graph space.

Clustering Based on Density Propagation and Subcluster Merging

no code implementations4 Nov 2024 Feiping Nie, Yitao Song, Jingjing Xue, Rong Wang, Xuelong Li

We propose the DPSM method, a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.

Clustering Node Clustering

Physics in Next-token Prediction

no code implementations1 Nov 2024 Hongjun An, Yiliang Song, Xuelong Li

We discovered the underlying physics in Next-token Prediction (NTP).

Prediction

FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

no code implementations29 Oct 2024 Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li

Reconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints.

Optical Flow Estimation

Efficient Diffusion as Low Light Enhancer

no code implementations16 Oct 2024 Guanzhou Lan, Qianli Ma, YuQi Yang, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao

In this paper, we identify two primary factors contributing to performance degradation: fitting errors and the inference gap.

Low-Light Image Enhancement

Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning

no code implementations11 Oct 2024 Yunpeng Gao, Zhigang Wang, Linglin Jing, Dong Wang, Xuelong Li, Bin Zhao

Aerial Vision-and-Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues.

Language Modeling Language Modelling +4

SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy

1 code implementation9 Oct 2024 Yuhan Kang, Qingpeng Li, Leyuan Fang, Jian Zhao, Xuelong Li

In this paper, considering that the surrounding environment information can be well utilized to identify the concealed objects, and thus, we propose a novel deep Surrounding-Aware Network, namely SurANet, for COD tasks, which introduces surrounding information into feature extraction and loss function to improve the discrimination.

Contrastive Learning object-detection +1

Personalized Quantum Federated Learning for Privacy Image Classification

no code implementations3 Oct 2024 Jinjing Shi, Tian Chen, Shichao Zhang, Xuelong Li

A personalized quantum federated learning algorithm for privacy image classification is proposed to enhance the personality of the client model in the case of an imbalanced distribution of images.

Classification Image Classification +1

Causal Deciphering and Inpainting in Spatio-Temporal Dynamics via Diffusion Model

no code implementations29 Sep 2024 Yifan Duan, Jian Zhao, Pengcheng, Junyuan Mao, Hao Wu, Jingyu Xu, Shilong Wang, Caoyuan Ma, Kai Wang, Kun Wang, Xuelong Li

To this end, we establish a causal framework for ST predictions, termed CaPaint, which targets to identify causal regions in data and endow model with causal reasoning ability in a two-stage process.

Causal Discovery Image Inpainting

Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement

1 code implementation25 Sep 2024 Guanlin Li, Ke Zhang, Ting Wang, Ming Li, Bin Zhao, Xuelong Li

Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements.

Contrastive Learning Low-Light Image Enhancement +1

COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models

1 code implementation23 Sep 2024 Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Xuelong Li, Bin Zhao

Specifically, a Proposal-Execution-Feedback-Adjustment (PEFA) mechanism is designed to decompose and assign actions for individual robots, where a centralized task assigner makes a task planning proposal to decompose the complex task into subtasks, and then assigns subtasks to robot executors.

Task Planning

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

no code implementations18 Sep 2024 Zhaxizhuoma Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li

To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings.

Task Planning

PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

1 code implementation20 Aug 2024 Yongbo Yu, Weizhong Yu, Feiping Nie, Xuelong Li

The self-attention mechanism in Transformer architecture, invariant to sequence order, necessitates positional embeddings to encode temporal order in time series prediction.

Multivariate Time Series Forecasting Temporal Sequences +2

Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise

no code implementations19 Aug 2024 Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li

Inspired by the theoretical study, a framework that develops a $\pi$-noise generator to learn the beneficial noise (instead of estimation) as data augmentations for contrast is proposed.

Contrastive Learning Data Augmentation

ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model

no code implementations8 Aug 2024 Yifan Chen, Xiaozhen Qiao, Zhe Sun, Xuelong Li

In this paper, we propose a novel approach, ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model, which aims to comprehensively distill the knowledge from a large teacher CLIP model into a smaller student model, ensuring comparable performance with significantly reduced parameters.

Contrastive Learning Knowledge Distillation

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

no code implementations6 Aug 2024 Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the mechanisms of "how to do".

Efficient Exploration Imitation Learning +1

Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation

no code implementations2 Aug 2024 Ruoxuan Feng, Di Hu, Wenke Ma, Xuelong Li

Humans possess a remarkable talent for flexibly alternating to different senses when interacting with the environment.

Imitation Learning

Cross-Scan Mamba with Masked Training for Robust Spectral Imaging

no code implementations1 Aug 2024 Wenzhe Tian, Haijin Zeng, Yin-Ping Zhao, Yongyong Chen, Zhen Wang, Xuelong Li

Current CNN-based methods are limited in modeling long-range dependencies, while Transformer-based models face high computational complexity.

Computational Efficiency Mamba

SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

1 code implementation1 Aug 2024 Hongjun An, Yifan Chen, Zhe Sun, Xuelong Li

Current large language models (LLMs) primarily utilize next-token prediction method for inference, which significantly impedes their processing speed.

Decoder Sentence

Diffusion-driven lensless fiber endomicroscopic quantitative phase imaging towards digital pathology

no code implementations26 Jul 2024 Zhaoqing Chen, Jiawei Sun, Xibin Yang, Xinyi Ye, Bin Zhao, Xuelong Li, Juergen Czarske

Lensless fiber endomicroscope is an emerging tool for in-vivo microscopic imaging, where quantitative phase imaging (QPI) can be utilized as a label-free method to enhance image contrast.

Attribute Cell Segmentation +1

Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

no code implementations24 Jul 2024 Jingren Liu, Zhong Ji, Yunlong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems.

Continual Learning parameter-efficient fine-tuning

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

1 code implementation22 Jun 2024 Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li

We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations.

Reinforcement Learning (RL) SMAC+ +1

Learning Manipulation by Predicting Interaction

1 code implementation1 Jun 2024 Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation. Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively.

Representation Learning

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

no code implementations30 May 2024 Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li

In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning.

Instruction Following parameter-efficient fine-tuning +2

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

1 code implementation24 May 2024 Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation.

Segmentation Semantic Segmentation

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

no code implementations23 May 2024 Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, Xuelong Li

In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.

regression

Adaptive Fuzzy C-Means with Graph Embedding

no code implementations22 May 2024 Qiang Chen, Weizhong Yu, Feiping Nie, Xuelong Li

Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods.

Clustering Graph Embedding

Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning

1 code implementation10 May 2024 Xiaoyu Wen, Chenjia Bai, Kang Xu, Xudong Yu, Yang Zhang, Xuelong Li, Zhen Wang

In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains.

reinforcement-learning

Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning

no code implementations30 Apr 2024 Qiaosheng Zhang, Chenjia Bai, Shuyue Hu, Zhen Wang, Xuelong Li

Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.

Multi-agent Reinforcement Learning

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning

1 code implementation30 Apr 2024 Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen Wang, Xuelong Li

We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing.

Offline RL Reinforcement Learning (RL) +1

Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer

no code implementations26 Apr 2024 Xinpeng Li, Teng Wang, Jian Zhao, Shuyi Mao, Jinbao Wang, Feng Zheng, Xiaojiang Peng, Xuelong Li

Emotion recognition aims to discern the emotional state of subjects within an image, relying on subject-centric and contextual visual cues.

Emotion Classification Emotion Recognition

Robust Capped lp-Norm Support Vector Ordinal Regression

no code implementations25 Apr 2024 Haorui Xiang, Zhichang Wu, Guoxu Li, Rong Wang, Feiping Nie, Xuelong Li

Adhering to this concept, we introduce a new model, Capped $\ell_{p}$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers.

regression

CNN2GNN: How to Bridge CNN with GNN

no code implementations23 Apr 2024 Ziheng Jiao, Hongyuan Zhang, Xuelong Li

Notably, due to extracting the intra-sample representation of a single instance and the topological relationship among the datasets simultaneously, the performance of distilled ``boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.

Graph Learning Inductive Learning

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation

1 code implementation22 Apr 2024 Junyu Gao, Da Zhang, Xuelong Li

Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner.

Binary Classification Domain Generalization

CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

1 code implementation15 Apr 2024 Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li

Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories.

Attribute Transfer Learning +2

VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection

no code implementations15 Apr 2024 Bonan Ding, Jin Xie, Jing Nie, Jiale Cao, Xuelong Li, Yanwei Pang

Therefore, an effective solution involves transforming monocular images into LiDAR-like representations and employing a LiDAR-based 3D object detector to predict the 3D coordinates of objects.

Autonomous Driving Monocular 3D Object Detection +2

StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging

1 code implementation14 Apr 2024 Xuelong Li, Hongjun An, Guangying Li, Xing Wang, Guanghua Cheng, Zhe Sun

In this paper, we introduce StreakNet-Arch, a novel signal processing architecture designed for Underwater Carrier LiDAR-Radar (UCLR) imaging systems, to address the limitations in scatter suppression and real-time imaging.

Binary Classification

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

7 code implementations11 Apr 2024 Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers.

3D geometry parameter-efficient fine-tuning

Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

no code implementations7 Apr 2024 Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li

Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks.

D4RL Decision Making +1

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

no code implementations CVPR 2024 Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li

In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels.

Image Reconstruction Segmentation +2

Deep Contrastive Graph Learning with Clustering-Oriented Guidance

no code implementations25 Feb 2024 Mulin Chen, Bocheng Wang, Xuelong Li

Graph Convolutional Network (GCN) has exhibited remarkable potential in improving graph-based clustering.

Clustering Contrastive Learning +1

Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training

1 code implementation22 Feb 2024 Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li

In the pre-training stage, we employ a discrete diffusion model with a mask-and-replace diffusion strategy to predict future video tokens in the latent space.

QuanTest: Entanglement-Guided Testing of Quantum Neural Network Systems

1 code implementation20 Feb 2024 Jinjing Shi, Zimeng Xiao, Heyuan Shi, Yu Jiang, Xuelong Li

Subsequently, QuanTest formulates the problem of generating test inputs that maximize the quantum entanglement adequacy and capture incorrect behaviors of the QNN system as a joint optimization problem and solves it in a gradient-based manner to generate quantum adversarial examples.

software testing

Motion-Aware Video Frame Interpolation

no code implementations5 Feb 2024 Pengfei Han, Fuhua Zhang, Bin Zhao, Xuelong Li

Subsequently, a cross-scale motion structure is presented to estimate and refine intermediate flow maps by the extracted features.

Optical Flow Estimation Video Frame Interpolation

GQHAN: A Grover-inspired Quantum Hard Attention Network

no code implementations25 Jan 2024 Ren-xin Zhao, Jinjing Shi, Xuelong Li

In response to the dilemma of HAM and QML, a Grover-inspired Quantum Hard Attention Mechanism (GQHAM) consisting of a Flexible Oracle (FO) and an Adaptive Diffusion Operator (ADO) is proposed.

Binary Classification Hard Attention +1

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images

1 code implementation19 Jan 2024 Junyu Gao, Liangliang Zhao, Xuelong Li

Considering the absence of a dataset for this task, a large-scale Dataset (NWPU-MOC) is collected, consisting of 3, 416 scenes with a resolution of 1024 $\times$ 1024 pixels, and well-annotated using 14 fine-grained object categories.

Object Object Counting

Community Detection in the Multi-View Stochastic Block Model

no code implementations17 Jan 2024 Yexin Zhang, Zhongtian Ma, Qiaosheng Zhang, Zhen Wang, Xuelong Li

This paper considers the problem of community detection on multiple potentially correlated graphs from an information-theoretical perspective.

Community Detection Stochastic Block Model

Frequency Domain Nuances Mining for Visible-Infrared Person Re-identification

no code implementations4 Jan 2024 Yukang Zhang, Yang Lu, Yan Yan, Hanzi Wang, Xuelong Li

Specifically, we propose a novel Frequency Domain Nuances Mining (FDNM) method to explore the cross-modality frequency domain information, which mainly includes an amplitude guided phase (AGP) module and an amplitude nuances mining (ANM) module.

Face Recognition Person Re-Identification

SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation

1 code implementation19 Dec 2023 Feixiang Zhou, Zheheng Jiang, Huiyu Zhou, Xuelong Li

However, learning the representation of each frame by unsupervised contrastive learning for action segmentation remains an open and challenging problem.

Action Segmentation Contrastive Learning +3

Calibration-free quantitative phase imaging in multi-core fiber endoscopes using end-to-end deep learning

no code implementations12 Dec 2023 Jiawei Sun, Bin Zhao, Dong Wang, Zhigang Wang, Jie Zhang, Nektarios Koukourakis, Juergen W. Czarske, Xuelong Li

Quantitative phase imaging (QPI) through multi-core fibers (MCFs) has been an emerging in vivo label-free endoscopic imaging modality with minimal invasiveness.

Retrieval

DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement

no code implementations12 Dec 2023 Jingchun Zhou, Zongxin He, Qiuping Jiang, Kui Jiang, Xianping Fu, Xuelong Li

To solve this issue, previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features, limiting the generalization and adaptability of the model.

SSIM UIE

A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization

no code implementations26 Nov 2023 Feiping Nie, Jitao Lu, Danyang Wu, Rong Wang, Xuelong Li

To address the problems, we propose a novel N-Cut solver designed based on the famous coordinate descent method.

Clustering

Eliminating Quantization Errors in Classification-Based Sound Source Localization

1 code implementation21 Nov 2023 Linfeng Feng, Xiao-Lei Zhang, Xuelong Li

To address this, we propose an Unbiased Label Distribution (ULD) to eliminate quantization error in training targets.

Classification Quantization +2

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

no code implementations CVPR 2024 Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li

This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods.

Pose Tracking Simultaneous Localization and Mapping

Implicit Event-RGBD Neural SLAM

no code implementations CVPR 2024 Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li

To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping.

Diffusion-Based Adversarial Purification for Speaker Verification

no code implementations22 Oct 2023 Yibo Bai, Xiao-Lei Zhang, Xuelong Li

Recently, automatic speaker verification (ASV) based on deep learning is easily contaminated by adversarial attacks, which is a new type of attack that injects imperceptible perturbations to audio signals so as to make ASV produce wrong decisions.

Adversarial Purification Denoising +1

Discretize Relaxed Solution of Spectral Clustering via a Non-Heuristic Algorithm

1 code implementation19 Oct 2023 Hongyuan Zhang, Xuelong Li

Unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective.

Clustering

Distance Weighted Trans Network for Image Completion

no code implementations11 Oct 2023 Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Xuelong Li, Yue Lu

The challenge of image generation has been effectively modeled as a problem of structure priors or transformation.

Image Generation

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

7 code implementations4 Oct 2023 Yiwen Tang, Ray Zhang, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters.

parameter-efficient fine-tuning

QKSAN: A Quantum Kernel Self-Attention Network

no code implementations25 Aug 2023 Ren-xin Zhao, Jinjing Shi, Xuelong Li

Self-Attention Mechanism (SAM) excels at distilling important information from the interior of data to improve the computational efficiency of models.

Binary Classification Computational Efficiency +1

Disentangled Contrastive Image Translation for Nighttime Surveillance

no code implementations11 Jul 2023 Guanzhou Lan, Bin Zhao, Xuelong Li

Targeting the surveillance scenes, we develop a disentangled representation, which is an auxiliary pretext task that separates surveillance scenes into the foreground and background with contrastive learning.

Contrastive Learning Translation

Sequential Attention Source Identification Based on Feature Representation

no code implementations28 Jun 2023 Dongpeng Hou, Zhen Wang, Chao GAO, Xuelong Li

Snapshot observation based source localization has been widely studied due to its accessibility and low cost.

Decoder Graph Attention +1

Hierarchical Matching and Reasoning for Multi-Query Image Retrieval

1 code implementation26 Jun 2023 Zhong Ji, Zhihao LI, Yan Zhang, Haoran Wang, Yanwei Pang, Xuelong Li

Afterwards, the VR module is developed to excavate the potential semantic correlations among multiple region-query pairs, which further explores the high-level reasoning similarity.

Image Retrieval Retrieval

Variational Positive-incentive Noise: How Noise Benefits Models

no code implementations13 Jun 2023 Hongyuan Zhang, Sida Huang, Xuelong Li

From the experiments, it is shown that the proposed VPN generator can improve the base models.

Variational Inference

Learning Geometric Transformation for Point Cloud Completion

2 code implementations International Journal of Computer Vision 2023 Shengping Zhang, Xianzhu Liu, Haozhe Xie, Liqiang Nie, Huiyu Zhou, DaCheng Tao, Xuelong Li

It exploits the repetitive geometric structures in common 3D objects to recover the complete shapes, which contains three sub-networks: geometric patch network, structure transformation network, and detail refinement network.

Decoder global-optimization +1

Image Reconstruction for Accelerated MR Scan with Faster Fourier Convolutional Neural Networks

no code implementations5 Jun 2023 Xiaohan Liu, Yanwei Pang, Xuebin Sun, Yiming Liu, Yonghong Hou, ZhenChang Wang, Xuelong Li

To address this problem, we propose the following: (1) a novel convolutional operator called Faster Fourier Convolution (FasterFC) to replace the two consecutive convolution operations typically used in convolutional neural networks (e. g., U-Net, ResNet).

3D Reconstruction Image Reconstruction

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

1 code implementation NeurIPS 2023 Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li

Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.

Prompt Learning Reinforcement Learning (RL)

On the Value of Myopic Behavior in Policy Reuse

no code implementations28 May 2023 Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.

Imbalanced Aircraft Data Anomaly Detection

no code implementations17 May 2023 Hao Yang, Junyu Gao, Yuan Yuan, Xuelong Li

Anomaly detection in temporal data from sensors under aviation scenarios is a practical but challenging task: 1) long temporal data is difficult to extract contextual information with temporal correlation; 2) the anomalous data are rare in time series, causing normal/abnormal imbalance in anomaly detection, making the detector classification degenerate or even fail.

Anomaly Detection Time Series

Behavior Contrastive Learning for Unsupervised Skill Discovery

1 code implementation8 May 2023 Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li

Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill, which serves as an upper bound of the previous MI objective.

continuous-control Continuous Control +1

Transformer-based stereo-aware 3D object detection from binocular images

no code implementations24 Apr 2023 Hanqing Sun, Yanwei Pang, Jiale Cao, Jin Xie, Xuelong Li

In this paper, we explore the model design of Transformers in binocular 3D object detection, focusing particularly on extracting and encoding task-specific image correspondence information.

3D Object Detection Object +1

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One

1 code implementation20 Apr 2023 Hongyuan Zhang, Yanan Zhu, Xuelong Li

It extremely limits the application of stochastic optimization algorithms so that the training of GNN is usually time-consuming.

Stochastic Optimization

VTAE: Variational Transformer Autoencoder with Manifolds Learning

1 code implementation3 Apr 2023 Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, DaCheng Tao, Xuelong Li

This weak projection, however, can be addressed by a Riemannian metric, and we show that geodesics computation and accurate interpolations between data samples on the Riemannian manifold can substantially improve the performance of deep generative models.

Representation Learning

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance

7 code implementations29 Mar 2023 Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.

3D visual grounding

Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking

no code implementations CVPR 2023 Yihao Wang, Zhigang Wang, Bin Zhao, Dong Wang, Mulin Chen, Xuelong Li

In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e. g., security.

Fully Self-Supervised Depth Estimation from Defocus Clue

1 code implementation CVPR 2023 Haozhe Si, Bin Zhao, Dong Wang, Yunpeng Gao, Mulin Chen, Zhigang Wang, Xuelong Li

We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world.

Depth Estimation

USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval

1 code implementation17 Jan 2023 Yan Zhang, Zhong Ji, Di Wang, Yanwei Pang, Xuelong Li

(2) It limits the scale of negative sample pairs by employing the mini-batch based end-to-end training mechanism.

Contrastive Learning Image-text Retrieval +3

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding

no code implementations ICCV 2023 Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.

3D visual grounding

Positive-incentive Noise

no code implementations19 Dec 2022 Xuelong Li

After introducing the task entropy, the noise can be classified into two kinds, Positive-incentive noise (Pi-noise or $\pi$-noise) and pure noise, according to whether the noise can reduce the complexity of the task.

Multi-Task Learning

On the Global Solution of Soft k-Means

no code implementations7 Dec 2022 Feiping Nie, Hong Chen, Rong Wang, Xuelong Li

This paper presents an algorithm to solve the Soft k-Means problem globally.

Clustering

Counting Like Human: Anthropoid Crowd Counting on Modeling the Similarity of Objects

no code implementations2 Dec 2022 Qi Wang, Juncheng Wang, Junyu Gao, Yuan Yuan, Xuelong Li

The mainstream crowd counting methods regress density map and integrate it to obtain counting results.

Crowd Counting

Search to Pass Messages for Temporal Knowledge Graph Completion

1 code implementation30 Oct 2022 Zhen Wang, Haotong Du, Quanming Yao, Xuelong Li

In particular, we develop a generalized framework to explore topological and temporal information in TKGs.

Graph Neural Network Link Prediction +3

Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays

no code implementations19 Oct 2022 Shupei Liu, Linfeng Feng, Yijun Gong, Chengdong Liang, Chen Zhang, Xiao-Lei Zhang, Xuelong Li

To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes.

Meta-Causal Feature Learning for Out-of-Distribution Generalization

no code implementations22 Aug 2022 Yuqing Wang, Xiangxian Li, Zhuang Qi, Jingyu Li, Xuelong Li, Xiangxu Meng, Lei Meng

Causal inference has become a powerful tool to handle the out-of-distribution (OOD) generalization problem, which aims to extract the invariant features.

Causal Inference Out-of-Distribution Generalization +1

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

no code implementations20 Aug 2022 Yake Wei, Di Hu, Yapeng Tian, Xuelong Li

A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.

audio-visual learning Scene Understanding +1

Memorizing Complementation Network for Few-Shot Class-Incremental Learning

no code implementations11 Aug 2022 Zhong Ji, Zhishen Hou, Xiyao Liu, Yanwei Pang, Xuelong Li

Few-shot Class-Incremental Learning (FSCIL) aims at learning new concepts continually with only a few samples, which is prone to suffer the catastrophic forgetting and overfitting problems.

class-incremental learning Few-Shot Class-Incremental Learning +3

Low-Light Hyperspectral Image Enhancement

1 code implementation5 Aug 2022 Xuelong Li, Guanlin Li, Bin Zhao

The illumination enhancement branch is adopted to enlighten the low-frequency component with reduced resolution.

Image Enhancement

Deep Manifold Learning with Graph Mining

no code implementations18 Jul 2022 Xuelong Li, Ziheng Jiao, Hongyuan Zhang, Rui Zhang

Admittedly, Graph Convolution Network (GCN) has achieved excellent results on graph datasets such as social networks, citation networks, etc.

Graph Mining

QSAN: A Near-term Achievable Quantum Self-Attention Network

no code implementations14 Jul 2022 Jinjing Shi, Ren-xin Zhao, Wenxuan Wang, Shichao Zhang, Xuelong Li

Self-Attention Mechanism (SAM) is good at capturing the internal connections of features and greatly improves the performance of machine learning models, espeacially requiring efficient characterization and feature extraction of high-dimensional data.

Binary Classification Image Classification +3

Spatio-temporal Gait Feature with Global Distance Alignment

no code implementations7 Mar 2022 Yifan Chen, Yang Zhao, Xuelong Li

In this paper, we try to enhance the discrimination of spatio-temporal gait features from two aspects: effective extraction of spatio-temporal gait features and reasonable refinement of extracted features.

Gait Recognition

Matrix Completion via Non-Convex Relaxation and Adaptive Correlation Learning

no code implementations4 Mar 2022 Xuelong Li, Hongyuan Zhang, Rui Zhang

We theoretically validate that it is equivalent to the existing matrix completion models.

Matrix Completion

Relation Regularized Scene Graph Generation

no code implementations22 Feb 2022 Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li

Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.

Graph Classification Graph Generation +6

New Tight Relaxations of Rank Minimization for Multi-Task Learning

no code implementations9 Dec 2021 Wei Chang, Feiping Nie, Rong Wang, Xuelong Li

Multi-task learning has been observed by many researchers, which supposes that different tasks can share a low-rank common yet latent subspace.

Multi-Task Learning

Adaptive Shrink-Mask for Text Detection

no code implementations18 Nov 2021 Chuang Yang, Mulin Chen, Yuan Yuan, Qi Wang, Xuelong Li

It weakens the coupling of texts to shrink-masks, which improves the robustness of detection results.

Text Detection

AnchorGAE: General Data Clustering via $O(n)$ Bipartite Graph Convolution

no code implementations12 Nov 2021 Hongyuan Zhang, Jiankun Shi, Rui Zhang, Xuelong Li

The core problems mainly come from two aspects: (1) the graph is unavailable in the most clustering scenes so that how to construct high-quality graphs on the non-graph data is usually the most important part; (2) given n samples, the graph-based clustering methods usually consume at least $\mathcal O(n^2)$ time to build graphs and the graph convolution requires nearly $\mathcal O(n^2)$ for a dense graph and $\mathcal O(|\mathcal{E}|)$ for a sparse one with $|\mathcal{E}|$ edges.

Clustering

LDC-Net: A Unified Framework for Localization, Detection and Counting in Dense Crowds

no code implementations10 Oct 2021 Qi Wang, Tao Han, Junyu Gao, Yuan Yuan, Xuelong Li

The rapid development in visual crowd analysis shows a trend to count people by positioning or even detecting, rather than simply summing a density map.

Visual Crowd Analysis

Hierarchical Multimodal Transformer to Summarize Videos

no code implementations22 Sep 2021 Bin Zhao, Maoguo Gong, Xuelong Li

To integrate the two kinds of information, they are encoded in a two-stream scheme, and a multimodal fusion mechanism is developed based on the hierarchical transformer.

Machine Translation Supervised Video Summarization +2

Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification

no code implementations12 Sep 2021 Qi Wang, Sikai Bai, Junyu Gao, Yuan Yuan, Xuelong Li

In addition, due to domain gaps between different datasets, the performance is dramatically decreased when re-ID models pre-trained on label-rich datasets (source domain) are directly applied to other unlabeled datasets (target domain).

Person Re-Identification Unsupervised Domain Adaptation

Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer

1 code implementation2 Aug 2021 Junyu Gao, Maoguo Gong, Xuelong Li

To this end, we propose a Dilated Convolutional Swin Transformer (DCST) for congested crowd scenes.

Crowd Counting Representation Learning

Deep Contrastive Graph Representation via Adaptive Homotopy Learning

no code implementations17 Jun 2021 Rui Zhang, Chengjun Lu, Ziheng Jiao, Xuelong Li

In particular, in this paper, we apply AH to contrastive learning (AHCL) such that it can be effectively transferred from weak-supervised learning (given label priori) to unsupervised learning, where soft labels of contrastive learning are directly and adaptively learned.

Contrastive Learning

Non-Gradient Manifold Neural Network

no code implementations15 Jun 2021 Rui Zhang, Ziheng Jiao, Hongyuan Zhang, Xuelong Li

Moreover, by unifying the flexible Stiefel manifold and adaptive support vector machine, we devise the novel decision layer which efficiently fits the manifold structure of the data and label information.

Form

EA-Net: Edge-Aware Network for Flow-based Video Frame Interpolation

no code implementations17 May 2021 Bin Zhao, Xuelong Li

Specifically, in the flow estimation stage, three edge-aware mechanisms are developed to emphasize the frame edges in estimating flow maps, so that the edge-maps are taken as the auxiliary information to provide more guidance to boost the flow accuracy.

Video Frame Interpolation

AudioVisual Video Summarization

no code implementations17 May 2021 Bin Zhao, Maoguo Gong, Xuelong Li

Motivated by this, we propose to jointly exploit the audio and visual information for the video summarization task, and develop an AudioVisual Recurrent Network (AVRN) to achieve this.

Video Summarization

Reconstructive Sequence-Graph Network for Video Summarization

no code implementations10 May 2021 Bin Zhao, Haopeng Li, Xiaoqiang Lu, Xuelong Li

Then, the videos are summarized by exploiting both the local and global dependencies among shots.

Video Summarization

Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image

no code implementations24 Apr 2021 Qi Wang, Yanling Miao, Mulin Chen, Xuelong Li

In order to better handle the high dimensionality problem and preserve the spatial structures, this paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.

Clustering

Auto-weighted Multi-view Feature Selection with Graph Optimization

no code implementations11 Apr 2021 Qi Wang, Xu Jiang, Mulin Chen, Xuelong Li

In this paper, we focus on the unsupervised multi-view feature selection which tries to handle high dimensional data in the field of multi-view learning.

feature selection Graph Learning +1

Spatial-spectral Hyperspectral Image Classification via Multiple Random Anchor Graphs Ensemble Learning

no code implementations25 Mar 2021 Yanling Miao, Qi Wang, Mulin Chen, Xuelong Li

Graph-based semi-supervised learning methods, which deal well with the situation of limited labeled data, have shown dominant performance in practical applications.

Descriptive Diversity +3

Entropy Minimizing Matrix Factorization

no code implementations24 Mar 2021 Mulin Chen, Xuelong Li

Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization, which minimizes the entropy of the residue distribution and allows a few samples to have large approximation errors.

Clustering

Feature Weighted Non-negative Matrix Factorization

no code implementations24 Mar 2021 Mulin Chen, Maoguo Gong, Xuelong Li

Non-negative Matrix Factorization (NMF) is one of the most popular techniques for data representation and clustering, and has been widely used in machine learning and data analysis.

Clustering Diversity

Enhanced Principal Component Analysis under A Collaborative-Robust Framework

no code implementations22 Mar 2021 Rui Zhang, Hongyuan Zhang, Xuelong Li

Principal component analysis (PCA) frequently suffers from the disturbance of outliers and thus a spectrum of robust extensions and variations of PCA have been developed.

Clustering

Multi-channel Deep Supervision for Crowd Counting

no code implementations17 Mar 2021 Bo Wei, Mulin Chen, Qi Wang, Xuelong Li

To obtain the accurate supervision information of different channels, the MDSNet employs an auxiliary network called SupervisionNet (SN) to generate abundant supervision maps based on existing groundtruth.

Crowd Counting Decoder

Weather GAN: Multi-Domain Weather Translation Using Generative Adversarial Networks

no code implementations9 Mar 2021 Xuelong Li, Kai Kou, Bin Zhao

To this end, the generator of Weather GAN is composed of an initial translation module, an attention module and a weather-cue segmentation module.

Style Transfer Translation

Ensemble and Random Collaborative Representation-Based Anomaly Detector for Hyperspectral Imagery

no code implementations6 Jan 2021 Rong Wang, Yihang Lu, Qianrong Zhang, Feiping Nie, Zhen Wang, Xuelong Li

To alleviate this problem, we proposed a novel ensemble and random collaborative representation-based detector (ERCRD) for HAD, which comprises two closely related stages.

Anomaly Detection Ensemble Learning

Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection

no code implementations29 Dec 2020 Zhengxin Li, Feiping Nie, Jintang Bian, Xuelong Li

However, real-world data contain a large number of noise samples and features, making the similarity matrix constructed by original data cannot be completely reliable.

feature selection

Learning Independent Instance Maps for Crowd Localization

1 code implementation8 Dec 2020 Junyu Gao, Tao Han, Qi Wang, Yuan Yuan, Xuelong Li

Furthermore, to improve the segmentation quality for different density regions, we present a differentiable Binarization Module (BM) to output structured instance maps.

Binarization Segmentation

Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut

1 code implementation NeurIPS 2020 Shenfei Pei, Feiping Nie, Rong Wang, Xuelong Li

In particular, over 15x and 7x speed-up can be obtained with respect to $k$-means on the synthetic dataset of 1 million samples and the benchmark dataset (CelebA) of 200k samples, respectively [GitHub].

Clustering

Learning Feature Sparse Principal Subspace

1 code implementation NeurIPS 2020 Lai Tian, Feiping Nie, Rong Wang, Xuelong Li

This paper presents new algorithms to solve the feature-sparsity constrained PCA problem (FSPCA), which performs feature selection and PCA simultaneously.

Dimensionality Reduction feature selection

Muti-view Mouse Social Behaviour Recognition with Deep Graphical Model

1 code implementation4 Nov 2020 Zheheng Jiang, Feixiang Zhou, Aite Zhao, Xin Li, Ling Li, DaCheng Tao, Xuelong Li, Huiyu Zhou

To address this problem, we here propose a novel multiview latent-attention and dynamic discriminative model that jointly learns view-specific and view-shared sub-structures, where the former captures unique dynamics of each view whilst the latter encodes the interaction between the views.

Question Answering

Embedding Graph Auto-Encoder for Graph Clustering

no code implementations20 Feb 2020 Hongyuan Zhang, Rui Zhang, Xuelong Li

Driven by theoretical analysis about relaxed k-means, we design a specific GAE-based model for graph clustering to be consistent with the theory, namely Embedding Graph Auto-Encoder (EGAE).

Clustering Decoder +1

Low Rank Regularization: A Review

no code implementations14 Aug 2018 Zhanxuan Hu, Feiping Nie, Rong Wang, Xuelong Li

Low rank regularization, in essence, involves introducing a low rank or approximately low rank assumption for matrix we aim to learn, which has achieved great success in many fields including machine learning, data mining and computer version.

BIG-bench Machine Learning Image Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.