Search Results for author: Xiao Wang

Found 319 papers, 174 papers with code

Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach

1 code implementation19 May 2025 Shiao Wang, Xiao Wang, Liye Jin, Bo Jiang, Lin Zhu, Lan Chen, Yonghong Tian, Bin Luo

Existing tracking algorithms typically rely on low-frame-rate RGB cameras coupled with computationally intensive deep neural network architectures to achieve effective tracking.

Knowledge Distillation Visual Object Tracking

Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection

1 code implementation19 May 2025 Xiao Wang, Yu Jin, Lan Chen, Bo Jiang, Lin Zhu, Yonghong Tian, Jin Tang, Bin Luo

To address these issues, this paper proposes a novel dynamic graph induced contour-aware heat conduction network for event stream based object detection, termed CvHeat-DET.

Event-based vision Object +2

SLOT: Sample-specific Language Model Optimization at Test-time

1 code implementation18 May 2025 Yang Hu, Xingyu Zhang, Xueji Fang, Zhiyang Chen, Xiao Wang, Huatian Zhang, GuoJun Qi

We propose SLOT (Sample-specific Language Model Optimization at Test-time), a novel and parameter-efficient test-time inference approach that enhances a language model's ability to more accurately respond to individual prompts.

GSM8K Language Modeling +2

Lightweight Spatio-Temporal Attention Network with Graph Embedding and Rotational Position Encoding for Traffic Forecasting

no code implementations17 May 2025 Xiao Wang, Shun-Ren Yang

Additionally, the optimal frequency for rotational position encoding is determined through a grid search approach in both the spatial and temporal attention mechanisms.

Feature Engineering Graph Embedding +1

Describe Anything in Medical Images

no code implementations9 May 2025 Xi Xiao, Yunbei Zhang, Thanh-Huy Nguyen, Ba-Thinh Lam, Janet Wang, Jihun Hamm, Tianyang Wang, Xingjian Li, Xiao Wang, Hao Xu, Tianming Liu, Min Xu

Localized image captioning has made significant progress with models like the Describe Anything Model (DAM), which can generate detailed region-specific descriptions without explicit region-text supervision.

Attribute Diagnostic +1

4D Multimodal Co-attention Fusion Network with Latent Contrastive Alignment for Alzheimer's Diagnosis

no code implementations23 Apr 2025 Yuxiang Wei, Yanteng Zhang, Xi Xiao, Tianyang Wang, Xiao Wang, Vince D. Calhoun

Multimodal neuroimaging provides complementary structural and functional insights into both human brain organization and disease-related dynamics.

Diagnostic

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

no code implementations22 Apr 2025 Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, XiaoFeng Wang, DaCheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu

Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e. g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs.

Model Editing

Adversarial Attack for RGB-Event based Visual Object Tracking

1 code implementation19 Apr 2025 Qiang Chen, Xiao Wang, Haowen Wang, Bo Jiang, Lin Zhu, Dawei Zhang, Yonghong Tian, Jin Tang

To bridge this gap, in this paper, we propose a cross-modal adversarial attack algorithm for RGB-Event visual tracking.

Adversarial Attack Visual Object Tracking +1

CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework

1 code implementation17 Apr 2025 Wentao Wu, Xiao Wang, Chenglong Li, Bo Jiang, Jin Tang, Bin Luo, Qi Liu

Event cameras have attracted increasing attention in recent years due to their advantages in high dynamic range, high temporal resolution, low power consumption, and low latency.

Contrastive Learning

Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials

no code implementations14 Apr 2025 Jingyun Yang, Ruoyan Avery Yin, Chi Jiang, Yuepeng Hu, Xiaokai Zhu, Xingjian Hu, Sutharsika Kumar, Xiao Wang, Xiaohua Zhai, Keran Rong, Yunyue Zhu, Tianyi Zhang, Zongyou Yin, Jing Kong, Neil Zhenqiang Gong, Zhichu Ren, Haozhe Wang

This work represents the implementation of foundation models to achieve autonomous analysis, establishing a scalable and data-efficient characterization paradigm that fundamentally transforms the approach to nanoscale materials research.

Image Segmentation Prompt Engineering +1

RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework

1 code implementation14 Apr 2025 Xiao Wang, Haiyang Wang, Shiao Wang, Qiang Chen, Jiandong Jin, Haoyu Song, Bo Jiang, Chenglong Li

In this paper, we revisit these issues and propose a novel multi-modal RGB-Event attribute recognition task by drawing inspiration from the advantages of event cameras in low-light, high-speed, and low-power consumption.

Attribute Pedestrian Attribute Recognition

Bregman Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Zeroth-order Optimization

no code implementations13 Apr 2025 Qiankun Shi, Xiao Wang, Hao Wang

In particular, starting from a near-feasible initial point and using Rademacher smoothing, the oracle complexity is in order \(O(p d^{2/p} \epsilon^{-3})\) for \(p \in [2, 2 \ln d]\), and \(O(\ln d \cdot \epsilon^{-3})\) for \(p > 2 \ln d\), where \(d\) denotes the problem dimension.

Adversarial Attack

ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation

no code implementations2 Apr 2025 Xiao Wang, Daniil Larionov, Siwei Wu, Yiqi Liu, Steffen Eger, Nafise Sadat Moosavi, Chenghua Lin

In this work, we introduce ContrastScore, a contrastive evaluation metric designed to enable higher-quality, less biased, and more efficient assessment of generated text.

Machine Translation Text Generation

Visual Variational Autoencoder Prompt Tuning

no code implementations22 Mar 2025 Xi Xiao, Yunbei Zhang, Yanshuh Li, Xingjian Li, Tianyang Wang, Jihun Hamm, Xiao Wang, Min Xu

Parameter-efficient fine-tuning (PEFT) has emerged as a crucial approach for adapting large vision transformers to downstream tasks without the prohibitive computational costs of full fine-tuning.

Diversity parameter-efficient fine-tuning +1

Variance-Aware Noisy Training: Hardening DNNs against Unstable Analog Computations

no code implementations20 Mar 2025 Xiao Wang, Hendrik Borras, Bernhard Klein, Holger Fröning

One of the most effective techniques for enhancing robustness, Noisy Training, introduces noise during the training phase to reinforce the model against disturbances encountered during inference.

Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity

no code implementations20 Mar 2025 Qiankun Shi, Jie Peng, Kun Yuan, Xiao Wang, Qing Ling

We establish the lower bounds on the Byzantine error and on the minimum number of queries to a stochastic gradient oracle required to achieve an arbitrarily small optimization error.

Stochastic Optimization

Pseudo-Relevance Feedback Can Improve Zero-Shot LLM-Based Dense Retrieval

no code implementations19 Mar 2025 Hang Li, Xiao Wang, Bevan Koopman, Guido Zuccon

Pseudo-relevance feedback (PRF) refines queries by leveraging initially retrieved documents to improve retrieval effectiveness.

Passage Retrieval Retrieval

TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection

no code implementations18 Mar 2025 Qiang Qi, Xiao Wang

Video object detection has made significant progress in recent years thanks to convolutional neural networks (CNNs) and vision transformers (ViTs).

object-detection Video Object Detection

DehazeMamba: SAR-guided Optical Remote Sensing Image Dehazing with Adaptive State Space Model

no code implementations17 Mar 2025 Zhicheng Zhao, Jinquan Yan, Chenglong Li, Xiao Wang, Jin Tang

Optical remote sensing image dehazing presents significant challenges due to its extensive spatial scale and highly non-uniform haze distribution, which traditional single-image dehazing methods struggle to address effectively.

Image Dehazing Semantic Segmentation +1

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding

1 code implementation16 Mar 2025 Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie

Multimodal Large Language Models (MLLMs) have revolutionized video understanding, yet are still limited by context length when processing long videos.

Video Understanding

CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement

no code implementations11 Mar 2025 Chenrui Ma, Rongchang Zhao, Xi Xiao, Hongyang Xie, Tianyang Wang, Xiao Wang, Hao Zhang, Yanning Shen

While deep generative models have significantly advanced representation learning, they may inherit or amplify biases and fairness issues by encoding sensitive attributes alongside predictive features.

Disentanglement Fairness

Large Language Model Guided Progressive Feature Alignment for Multimodal UAV Object Detection

no code implementations10 Mar 2025 Wentao Wu, Chenglong Li, Xiao Wang, Bin Luo, Qi Liu

To address this problem, we propose a Large Language Model (LLM) guided Progressive feature Alignment Network called LPANet, which leverages the semantic features extracted from a large language model to guide the progressive semantic and spatial alignment between modalities for multimodal UAV object detection.

Language Modeling Language Modelling +4

Split Adaptation for Pre-trained Vision Transformers

no code implementations1 Mar 2025 Lixu Wang, Bingqi Shang, Yi Li, Payal Mohapatra, Wei Dong, Xiao Wang, Qi Zhu

SA, inspired by split learning (SL), segments the pre-trained ViT into a frontend and a backend, with only the frontend shared with the client for data representation extraction.

Langevin Multiplicative Weights Update with Applications in Polynomial Portfolio Management

no code implementations26 Feb 2025 Yi Feng, Xiao Wang, Tian Xie

We consider nonconvex optimization problem over simplex, and more generally, a product of simplices.

global-optimization Management

Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

1 code implementation24 Feb 2025 Yuming Yang, Yang Nan, Junjie Ye, Shihan Dou, Xiao Wang, Shuo Li, Huijie Lv, Tao Gui, Qi Zhang, Xuanjing Huang

To address this, we systematically analyze 11 existing diversity measurement methods by assessing their correlation with model performance through extensive fine-tuning experiments.

Diversity

RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision

no code implementations19 Feb 2025 Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang

Retrieval-augmented generation (RAG) has shown great potential for knowledge-intensive tasks, but its traditional architectures rely on static retrieval, limiting their effectiveness for complex questions that require sequential information-seeking.

Prompt Engineering RAG +2

Censor Dependent Variational Inference

1 code implementation13 Feb 2025 Chuanhui Liu, Xiao Wang

This paper provides a comprehensive analysis of variational inference in latent variable models for survival analysis, emphasizing the distinctive challenges associated with applying variational methods to survival data.

Survival Analysis Variational Inference

EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition

1 code implementation13 Feb 2025 Xiao Wang, Jingtao Jiang, Dong Li, Futian Wang, Lin Zhu, YaoWei Wang, Yongyong Tian, Jin Tang

Mainstream Scene Text Recognition (STR) algorithms are developed based on RGB cameras which are sensitive to challenging factors such as low illumination, motion blur, and cluttered backgrounds.

Large Language Model Scene Text Recognition

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

no code implementations11 Feb 2025 Xiao Wang, Ibrahim Alabdulmohsin, Daniel Salz, Zhe Li, Keran Rong, Xiaohua Zhai

We provide an empirical investigation of the potential of pre-training vision-language models on an unprecedented scale: 100 billion examples.

Diversity

Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark

1 code implementation8 Feb 2025 Shiao Wang, Xiao Wang, Chao Wang, Liye Jin, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang

We then introduce a novel hierarchical knowledge distillation strategy that incorporates the similarity matrix, feature representation, and response map-based distillation to guide the learning of the student Transformer network.

Knowledge Distillation Visual Object Tracking

Sparse Measurement Medical CT Reconstruction using Multi-Fused Block Matching Denoising Priors

no code implementations3 Feb 2025 Maliha Hossain, Yuankai Huo, Xinqiang Yan, Xiao Wang

Instead of directly using a 3D prior, this work proposes a BM3D Multi Slice Fusion (BM3D-MSF) prior that uses multiple 2D image denoisers fused to act as a fully 3D prior model in Plug and Play reconstruction approach.

CT Reconstruction Denoising

On Hardening DNNs against Noisy Computations

no code implementations24 Jan 2025 Xiao Wang, Hendrik Borras, Bernhard Klein, Holger Fröning

This work investigates the effectiveness of training neural networks with quantization to increase the robustness against noise.

Quantization

Fast-RF-Shimming: Accelerate RF Shimming in 7T MRI using Deep Learning

no code implementations21 Jan 2025 Zhengyi Lu, Hao Liang, Ming Lu, Xiao Wang, Xinqiang Yan, Yuankai Huo

This approach offers a faster and more efficient solution to RF shimming challenges in UHF MRI.

Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking

no code implementations17 Jan 2025 Futian Wang, Fengxiang Liu, Xiao Wang

In the realm of multi-object tracking, the challenge of accurately capturing the spatial and temporal relationships between objects in video sequences remains a significant hurdle.

Graph Learning Multi-Object Tracking

Scale-up Unlearnable Examples Learning with High-Performance Computing

1 code implementation10 Jan 2025 Yanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I. Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo

A prominent method within this area, called Unlearnable Clustering (UC), has shown improved UE performance with larger batch sizes but was previously limited by computational resources.

Diagnostic

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

1 code implementation7 Jan 2025 Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li, YaoWei Wang, Yonghong Tian, Jin Tang

X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient in describing key diseases.

Language Modeling Language Modelling +2

Rethinking Byzantine Robustness in Federated Recommendation from Sparse Aggregation Perspective

1 code implementation6 Jan 2025 Zhongjian Zhang, Mengmei Zhang, Xiao Wang, Lingjuan Lyu, Bo Yan, Junping Du, Chuan Shi

Unlike FL, FR has a unique sparse aggregation mechanism, where the embedding of each item is updated by only partial clients, instead of full clients in a dense aggregation of general FL.

Federated Learning Recommendation Systems

Unsupervised dense retrieval with conterfactual contrastive learning

no code implementations30 Dec 2024 Haitian Chen, Qingyao Ai, Xiao Wang, Yiqun Liu, Fen Lin, Qin Liu

In response to these challenges, we propose to improve the robustness of dense retrieval models by enhancing their sensitivity of fine-graned relevance signals.

Contrastive Learning counterfactual +2

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

1 code implementation29 Dec 2024 Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie

Video Large Language Models (VideoLLMs) have made significant strides in video understanding but struggle with long videos due to the limitations of their backbone LLMs.

Video Compression Video Understanding

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition

1 code implementation28 Dec 2024 Lan Chen, Haoxiang Yang, Pengpeng Shao, Haoyu Song, Xiao Wang, Zhicheng Zhao, YaoWei Wang, Yonghong Tian

Inspired by the successful application of large models, the introduction of such large models can also be considered to further enhance the performance of multi-modal tasks.

parameter-efficient fine-tuning

Bi-directional Mapping of Morphology Metrics and 3D City Blocks for Enhanced Characterization and Generation of Urban Form

no code implementations20 Dec 2024 Chenyi Cai, Biao Li, Qiyan Zhang, Xiao Wang, Filip Biljecki, Pieter Herthogs

This paper highlights the importance of establishing a bi-directional mapping between morphology metrics and complex urban form to enable the integration of urban form generation with performance evaluation.

Form Information Retrieval

DocFusion: A Unified Framework for Document Parsing Tasks

1 code implementation17 Dec 2024 Mingxu Chai, Ziyu Shen, Chong Zhang, Yue Zhang, Xiao Wang, Shihan Dou, Jihua Kang, Jiazheng Zhang, Qi Zhang

Document parsing is essential for analyzing complex document structures and extracting fine-grained information, supporting numerous downstream applications.

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

1 code implementation9 Dec 2024 Xiao Wang, Yu Jin, Wentao Wu, Wei zhang, Lin Zhu, Bo Jiang, Yonghong Tian

Object detection in event streams has emerged as a cutting-edge research area, demonstrating superior performance in low-light conditions, scenarios with motion blur, and rapid movements.

Computational Efficiency Mixture-of-Experts +3

Blockchain Data Analysis in the Era of Large-Language Models

no code implementations9 Dec 2024 Kentaroh Toyoda, Xiao Wang, Mingzhe Li, Bo Gao, YuAn Wang, Qingsong Wei

Blockchain data analysis is essential for deriving insights, tracking transactions, identifying patterns, and ensuring the integrity and security of decentralized networks.

Fraud Detection

Nimbus: Secure and Efficient Two-Party Inference for Transformers

1 code implementation24 Nov 2024 Zhengyi Li, Kang Yang, Jin Tan, Wen-jie Lu, Haoqi Wu, Xiao Wang, Yu Yu, Derun Zhao, Yancheng Zheng, Minyi Guo, Jingwen Leng

For the linear layer, we propose a new 2PC paradigm along with an encoding approach to securely compute matrix multiplications based on an outer-product insight, which achieves $2. 9\times \sim 12. 5\times$ performance improvements compared to the state-of-the-art (SOTA) protocol.

Anderson Acceleration in Nonsmooth Problems: Local Convergence via Active Manifold Identification

no code implementations12 Oct 2024 Kexin Li, Luwei Bai, Xiao Wang, Hao Wang

Anderson acceleration is an effective technique for enhancing the efficiency of fixed-point iterations; however, analyzing its convergence in nonsmooth settings presents significant challenges.

Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion

no code implementations11 Oct 2024 Shiao Wang, Yifeng Wang, Qingchuan Ma, Xiao Wang, Ning Yan, Qingquan Yang, Guosheng Xu, Jin Tang

Q-distribution prediction is a crucial research direction in controlled nuclear fusion, with deep learning emerging as a key approach to solving prediction challenges.

Deep Learning Prediction

Exploiting Memory-aware Q-distribution Prediction for Nuclear Fusion via Modern Hopfield Network

no code implementations11 Oct 2024 Qingchuan Ma, Shiao Wang, Tong Zheng, Xiaodong Dai, Yifeng Wang, Qingquan Yang, Xiao Wang

This study addresses the critical challenge of predicting the Q-distribution in long-term stable nuclear fusion task, a key component for advancing clean energy solutions.

Prediction

HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter

no code implementations10 Oct 2024 Yumiao Zhao, Bo Jiang, Xiao Wang, Qin Xu, Jin Tang

To address these issues, in this paper, we propose a novel Heterogeneous Graph Adapter to achieve tuning VLMs for the downstream tasks.

Graph Neural Network

SNN-PAR: Energy Efficient Pedestrian Attribute Recognition via Spiking Neural Networks

1 code implementation10 Oct 2024 Haiyang Wang, Qian Zhu, Mowen She, Yabo Li, Haoyu Song, Minghe Xu, Xiao Wang

To address this issue, in this paper, we propose a Spiking Neural Network (SNN) based framework for energy-efficient attribute recognition.

Attribute Knowledge Distillation +1

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

1 code implementation1 Oct 2024 Xiao Wang, Fuling Wang, Yuehang Li, Qingchuan Ma, Shiao Wang, Bo Jiang, Chuanfu Li, Jin Tang

Thus, we conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs), on the CheXpert Plus dataset.

Benchmarking Contrastive Learning +3

Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding

no code implementations29 Sep 2024 Xiao Wang, Jianlong Wu, Zijia Lin, Fuzheng Zhang, Di Zhang, Liqiang Nie

For iterative refinement, we first leverage a video-language model to generate synthetic annotations, resulting in a refined dataset.

Diversity Question Answering +2

Adaptive Learning of the Latent Space of Wasserstein Generative Adversarial Networks

1 code implementation27 Sep 2024 Yixuan Qiu, Qingyi Gao, Xiao Wang

Generative models based on latent variables, such as generative adversarial networks (GANs) and variational auto-encoders (VAEs), have gained lots of interests due to their impressive performance in many fields.

Ig3D: Integrating 3D Face Representations in Facial Expression Inference

no code implementations29 Aug 2024 Lu Dong, Xiao Wang, Srirangaraj Setlur, Venu Govindaraju, Ifeoma Nwogu

Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks.

Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection

1 code implementation27 Aug 2024 Siyuan Yao, Hao Sun, Tian-Zhu Xiang, Xiao Wang, Xiaochun Cao

In this paper, we propose a hierarchical graph interaction network termed HGINet for camouflaged object detection, which is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.

Decoder object-detection +1

VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models

1 code implementation23 Aug 2024 Wentao Wu, Fanghua Hong, Xiao Wang, Chenglong Li, Jin Tang

In this work, we propose a new vehicle detection paradigm based on a pre-trained foundation vehicle model (VehicleMAE) and a large language model (T5), termed VFM-Det.

Contrastive Learning Language Modelling +2

MambaEVT: Event Stream based Visual Object Tracking using State Space Model

1 code implementation20 Aug 2024 Xiao Wang, Chao Wang, Shiao Wang, Xixi Wang, Zhicheng Zhao, Lin Zhu, Bo Jiang

More importantly, we consider introducing a dynamic template update strategy into the tracking framework using the Memory Mamba network.

Mamba Object Localization +2

Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

1 code implementation20 Aug 2024 Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, YaoWei Wang

Based on this dataset and several other large-scale datasets, we propose a novel baseline method that fully leverages the Mamba model's ability to integrate temporal information of CNN features, resulting in improved sign language translation outcomes.

Mamba Sign Language Translation +1

Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework

2 code implementations19 Aug 2024 Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li

To address this issue, this paper proposes a new large-scale, cross-domain pedestrian attribute recognition dataset to fill the data gap, termed MSP60K.

Attribute Ensemble Learning +4

Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms

1 code implementation19 Aug 2024 Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian

In this paper, we propose a large-scale, high-definition ($1280 \times 800$) human action recognition dataset based on the CeleX-V event camera, termed CeleX-HAR.

Action Recognition Mamba +1

R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

1 code implementation19 Aug 2024 Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang

They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation.

Language Modeling Language Modelling +4

Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?

1 code implementation16 Aug 2024 Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi

By presenting the empirical results, we find that despite that LLMs can improve the robustness of GNNs, there is still an average decrease of 23. 1% in accuracy, implying that the GNNs remain extremely vulnerable against topology attacks.

Adversarial Robustness

Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining

1 code implementation15 Aug 2024 Xixi Wang, Zitian Wang, Jingtao Jiang, Lan Chen, Xiao Wang, Bo Jiang

We also introduce a motion augmented strategy that leverages motion cues as an additional output to aggregate with the spatial features for improved results.

Change Detection

SMILES-Mamba: Chemical Mamba Foundation Models for Drug ADMET Prediction

no code implementations11 Aug 2024 Bohao Xu, Yingzhou Lu, Chenhao Li, Ling Yue, Xiao Wang, Nan Hao, Tianfan Fu, Jim Chen

In drug discovery, predicting the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of small-molecule drugs is critical for ensuring safety and efficacy.

Drug Discovery Mamba +3

AI Foundation Models in Remote Sensing: A Survey

no code implementations6 Aug 2024 Siqi Lu, Junlin Guo, James R Zimmer-Dauphinee, Jordan M Nieusma, Xiao Wang, Parker VanValkenburgh, Steven A Wernke, Yuankai Huo

Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis.

Contrastive Learning object-detection +4

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

1 code implementation1 Aug 2024 Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang

The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions.

MedQA MMLU +5

EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching

no code implementations31 Jul 2024 Pengjie Zhang, Lin Zhu, Xiao Wang, Lizhi Wang, Wanxuan Lu, Hua Huang

Specifically, our method utilizes a Temporal Recurrent Network to aggregate event features across temporal or spatial domains, and a Spatial Contextual Attention to enhance knowledge transfer across event flows via temporal or spatial interactions.

Depth Estimation Disparity Estimation +4

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

no code implementations15 Jul 2024 Lin Zhu, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang

However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts.

Denoising Event-Based Video Reconstruction +1

An Empirical Study of Mamba-based Pedestrian Attribute Recognition

1 code implementation15 Jul 2024 Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao, Qingchuan Ma, Chenglong Li, Jin Tang

To further tap into the potential of the novel Mamba architecture for PAR tasks, this paper designs and adapts Mamba into two typical PAR frameworks, i. e., the text-image fusion approach and pure vision Mamba multi-label recognition framework.

Attribute Mamba +1

On Large Language Model Continual Unlearning

1 code implementation14 Jul 2024 Chongyang Gao, Lixu Wang, Kaize Ding, Chenkai Weng, Xiao Wang, Qi Zhu

The results indicate that OOO consistently achieves the best unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests.

Disentanglement Language Modeling +4

Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition

1 code implementation27 Jun 2024 Lan Chen, Dong Li, Xiao Wang, Pengpeng Shao, Wei zhang, YaoWei Wang, Yonghong Tian, Jin Tang

In this paper, we propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.

Graph Neural Network

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

1 code implementation26 Jun 2024 Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, WenYu Zhan, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang

As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research.

Safety Alignment

TemPrompt: Multi-Task Prompt Learning for Temporal Relation Extraction in RAG-based Crowdsourcing Systems

no code implementations21 Jun 2024 Jing Yang, Yu Zhao, Linyao Yang, Xiao Wang, Long Chen, Fei-Yue Wang

Temporal relation extraction (TRE) aims to grasp the evolution of events or actions, and thus shape the workflow of associated tasks, so it holds promise in helping understand task requests initiated by requesters in crowdsourcing systems.

Contrastive Learning Language Modeling +6

InternLM-Law: An Open Source Chinese Legal Large Language Model

1 code implementation21 Jun 2024 Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge

While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field.

Diversity Language Modeling +3

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

1 code implementation17 Jun 2024 Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo wang, Qi Zhang, Liang Ding, DaCheng Tao

In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals.

Position

Last-iterate Convergence Separation between Extra-gradient and Optimism in Constrained Periodic Games

no code implementations15 Jun 2024 Yi Feng, Ping Li, Ioannis Panageas, Xiao Wang

Last-iterate behaviors of learning algorithms in repeated two-player zero-sum games have been extensively studied due to their wide applications in machine learning and related tasks.

Inductive Global and Local Manifold Approximation and Projection

no code implementations12 Jun 2024 Jungeum Kim, Xiao Wang

Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis.

Data Visualization

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

1 code implementation10 Jun 2024 Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, ZongYuan Ge, Gang Li, James Zou, Huaxiu Yao

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare.

Fairness

Explaining the Contributing Factors for Vulnerability Detection in Machine Learning

no code implementations5 Jun 2024 Esma Mouine, Yan Liu, Lu Xiao, Rick Kazman, Xiao Wang

A fundamental but unresolved research question is: how do different factors in the mining and learning process impact the accuracy of identifying vulnerabilities in software projects of varying characteristics?

Vulnerability Detection

RAG-based Crowdsourcing Task Decomposition via Masked Contrastive Learning with Prompts

no code implementations4 Jun 2024 Jing Yang, Xiao Wang, Yu Zhao, Yuhang Liu, Fei-Yue Wang

Therefore, we present a Prompt-Based Contrastive learning framework for TD (PBCT), which incorporates a prompt-based trigger detector to overcome dependence.

Common Sense Reasoning Contrastive Learning +4

AMCEN: An Attention Masking-based Contrastive Event Network for Two-stage Temporal Knowledge Graph Reasoning

no code implementations16 May 2024 Jing Yang, Xiao Wang, Yutong Wang, Jiawei Wang, Fei-Yue Wang

To achieve more accurate TKG reasoning, we propose an attention masking-based contrastive event network (AMCEN) with local-global temporal patterns for the two-stage prediction of future events.

Contrastive Learning Knowledge Graphs +1

SignAvatar: Sign Language 3D Motion Reconstruction and Generation

no code implementations13 May 2024 Lu Dong, Lipisha Chaudhary, Fei Xu, Xiao Wang, Mason Lary, Ifeoma Nwogu

Achieving expressive 3D motion reconstruction and automatic generation for isolated sign words can be challenging, due to the lack of real-world 3D sign-word data, the complex nuances of signing motions, and the cross-modal understanding of sign language semantics.

Less is More: on the Over-Globalizing Problem in Graph Transformers

1 code implementation2 May 2024 Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi

Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes.

Mamba-FETrack: Frame-Event Tracking via State Space Model

2 code implementations28 Apr 2024 Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang, Bo Jiang

Specifically, our Mamba-based tracker achieves 43. 5/55. 6 on the SR/PR metric, while the ViT-S based tracker (OSTrack) obtains 40. 0/50. 9.

Mamba Object Localization

Pre-training on High Definition X-ray Images: An Experimental Study

1 code implementation27 Apr 2024 Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang

Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e. g., 224 $\times$ 224).

Decoder Miscellaneous

Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

3 code implementations27 Apr 2024 Xiao Wang, Qian Zhu, Jiandong Jin, Jun Zhu, Futian Wang, Bo Jiang, YaoWei Wang, Yonghong Tian

Specifically, we formulate the video-based PAR as a vision-language fusion problem and adopt a pre-trained foundation model CLIP to extract the visual features.

Attribute Pedestrian Attribute Recognition +1

Implicit Generative Prior for Bayesian Neural Networks

1 code implementation27 Apr 2024 Yijia Liu, Xiao Wang

The results of our experiments highlight the superiority of our proposed framework over existing methods, such as sparse variational Bayesian and generative models, in terms of prediction accuracy and uncertainty quantification.

Classification Consistency Computational Efficiency +3

S4TP: Social-Suitable and Safety-Sensitive Trajectory Planning for Autonomous Vehicles

no code implementations18 Apr 2024 Xiao Wang, Ke Tang, Xingyuan Dai, Jintao Xu, Quancheng Du, Rui Ai, Yuxiao Wang, Weihao Gu

To effectively assess the risks prevailing in the vicinity of AVs in social interactive traffic scenarios and achieve safe autonomous driving, this article proposes a social-suitable and safety-sensitive trajectory planning (S4TP) framework.

Autonomous Driving Motion Planning +2

Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning

no code implementations16 Apr 2024 Xiao Wang, Tianze Chen, Xianjun Yang, Qi Zhang, Xun Zhao, Dahua Lin

The open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.

In-Context Learning Instruction Following

State Space Model for New-Generation Network Alternative to Transformers: A Survey

1 code implementation15 Apr 2024 Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, YaoWei Wang, Yonghong Tian, Jin Tang

In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM.

Adaptive Patching for High-resolution Image Segmentation with Transformers

no code implementations15 Apr 2024 Enzhi Zhang, Isaac Lyngaas, Peng Chen, Xiao Wang, Jun Igarashi, Yuankai Huo, Mohamed Wahib, Masaharu Munetomo

For high-resolution images, e. g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation.

Friction Image Segmentation +2

Exploring Task Unification in Graph Representation Learning via Generative Approach

no code implementations21 Mar 2024 Yulan Hu, Sheng Ouyang, Zhirui Yang, Ge Chen, Junchen Wan, Xiao Wang, Yong liu

Specifically, GA^2E proposes to use the subgraph as the meta-structure, which remains consistent across all graph tasks (ranging from node-, edge-, and graph-level to transfer learning) and all stages (both during training and inference).

Graph Representation Learning Transfer Learning

Finding Visual Saliency in Continuous Spike Stream

1 code implementation10 Mar 2024 Lin Zhu, Xianzhang Chen, Xiao Wang, Hua Huang

Our framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models.

Saliency Detection

Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

4 code implementations9 Mar 2024 Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo

Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear.

Object Tracking Rgb-T Tracking

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

no code implementations7 Mar 2024 Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai

Interestingly, data and architectural improvements seem to mitigate the negative impact of data balancing on performance; e. g. applying M4 to SigLIP-B/16 with data quality filters improves COCO image-to-text retrieval @5 from 86% (without data balancing) to 87% and ImageNet 0-shot classification from 77% to 77. 5%!

Image to text Image-to-Text Retrieval +1

Can Small Language Models be Good Reasoners for Sequential Recommendation?

no code implementations7 Mar 2024 Yuling Wang, Changxin Tian, Binbin Hu, Yanhua Yu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Liang Pang, Xiao Wang

We encode the generated rationales from the student model into a dense vector, which empowers recommendation in both ID-based and ID-agnostic scenarios.

Knowledge Distillation Sequential Recommendation

Intent-aware Recommendation via Disentangled Graph Contrastive Learning

no code implementations6 Mar 2024 Yuling Wang, Xiao Wang, Xiangzhou Huang, Yanhua Yu, Haoyang Li, Mengdi Zhang, Zirui Guo, Wei Wu

The other is different behaviors have different intent distributions, so how to establish their relations for a more explainable recommender system.

Contrastive Learning Graph Neural Network

Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

1 code implementation NeurIPS 2023 Donglin Xia, Xiao Wang, Nian Liu, Chuan Shi

To address this challenge, we propose the Cluster Information Transfer (CIT) mechanism (Code available at https://github. com/BUPT-GAMMA/CITGNN), which can learn invariant representations for GNNs, thereby improving their generalization ability to various and unknown test graphs with structure shift.

Multi-Scale Subgraph Contrastive Learning

no code implementations5 Mar 2024 Yanbei Liu, Yu Zhao, Xiao Wang, Lei Geng, Zhitao Xiao

By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures.

Contrastive Learning Graph Classification

Minimum Topology Attacks for Graph Neural Networks

no code implementations5 Mar 2024 Mengmei Zhang, Xiao Wang, Chuan Shi, Lingjuan Lyu, Tianchi Yang, Junping Du

To break this dilemma, we propose a new type of topology attack, named minimum-budget topology attack, aiming to adaptively find the minimum perturbation sufficient for a successful attack on each node.

Small, Versatile and Mighty: A Range-View Perception Framework

no code implementations1 Mar 2024 Qiang Meng, Xiao Wang, Jiabao Wang, Liujiang Yan, Ke Wang

Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional architecture to fully unleash the efficiency and multi-tasking potentials of the range view representation.

Panoptic Segmentation

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

1 code implementation26 Feb 2024 Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs).

Code Completion Response Generation

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

no code implementations26 Feb 2024 Yuansen Zhang, Xiao Wang, Zhiheng Xi, Han Xia, Tao Gui, Qi Zhang, Xuanjing Huang

In this paper, drawing inspiration from recent works that LLMs are sensitive to the design of the instructions, we utilize instructions in code style, which are more structural and less ambiguous, to replace typically natural language instructions.

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

1 code implementation8 Feb 2024 Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, wei he, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.

GSM8K reinforcement-learning +1

A Lightweight Inception Boosted U-Net Neural Network for Routability Prediction

1 code implementation7 Feb 2024 Hailiang Li, Yan Huo, Yan Wang, Xu Yang, Miaohui Hao, Xiao Wang

As the modern CPU, GPU, and NPU chip design complexity and transistor counts keep increasing, and with the relentless shrinking of semiconductor technology nodes to nearly 1 nanometer, the placement and routing have gradually become the two most pivotal processes in modern very-large-scale-integrated (VLSI) circuit back-end design.

Avg SSIM

Federated Learning with New Knowledge: Fundamentals, Advances, and Futures

1 code implementation3 Feb 2024 Lixu Wang, Yang Zhao, Jiahua Dong, Ating Yin, Qinbin Li, Xiao Wang, Dusit Niyato, Qi Zhu

Federated Learning (FL) is a privacy-preserving distributed learning approach that is rapidly developing in an era where privacy protection is increasingly valued.

Federated Learning Privacy Preserving

Graph Fairness Learning under Distribution Shifts

no code implementations30 Jan 2024 Yibo Li, Xiao Wang, Yujie Xing, Shaohua Fan, Ruijia Wang, Yaoqi Liu, Chuan Shi

Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are under the same distribution, i. e., training data and testing data are from the same graph.

Fairness

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

1 code implementation21 Jan 2024 Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences.

Form

Uncertainty-aware Bridge based Mobile-Former Network for Event-based Pattern Recognition

1 code implementation20 Jan 2024 Haoxiang Yang, Chengguo Yuan, Yabin Zhu, Lan Chen, Xiao Wang, Futian Wang

The mainstream human activity recognition (HAR) algorithms are developed based on RGB cameras, which are easily influenced by low-quality images (e. g., low illumination, motion blur).

Human Activity Recognition

CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras

1 code implementation5 Jan 2024 Yabin Zhu, Xiao Wang, Chenglong Li, Bo Jiang, Lin Zhu, Zhixiang Huang, Yonghong Tian, Jin Tang

In this work, we formally propose the task of object tracking using unaligned neuromorphic and visible cameras.

Object Tracking

GenoCraft: A Comprehensive, User-Friendly Web-Based Platform for High-Throughput Omics Data Analysis and Visualization

1 code implementation21 Dec 2023 Yingzhou Lu, Minjie Shen, Ling Yue, Chenhao Li, Lulu Chen, Fan Meng, Xiao Wang, David Herrington, Yue Wang, Yue Zhao, Tianfan Fu, Capucine van Rechem

With GenoCraft, researchers and data scientists have access to an array of cutting-edge bioinformatics tools under a user-friendly interface, making it a valuable resource for managing and analyzing large-scale omics data.

Federated Continual Novel Class Learning

no code implementations21 Dec 2023 Lixu Wang, Chenxi Liu, Junfeng Guo, Jiahua Dong, Xiao Wang, Heng Huang, Qi Zhu

In a privacy-focused era, Federated Learning (FL) has emerged as a promising machine learning technique.

Federated Learning Novel Class Discovery +1

Optimizing Distributed Training on Frontier for Large Language Models

no code implementations20 Dec 2023 Sajal Dash, Isaac Lyngaas, Junqi Yin, Xiao Wang, Romain Egele, Guojing Cong, Feiyi Wang, Prasanna Balaprakash

For the training of the 175 Billion parameter model and the 1 Trillion parameter model, we achieved $100\%$ weak scaling efficiency on 1024 and 3072 MI250X GPUs, respectively.

Computational Efficiency

Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition

1 code implementation18 Dec 2023 Xiao Wang, Yao Rong, Shiao Wang, Yuan Chen, Zhe Wu, Bo Jiang, Yonghong Tian, Jin Tang

It is intuitive to combine them for high-performance RGB-Event based video recognition, however, existing works fail to achieve a good balance between the accuracy and model parameters, as shown in Fig.~\ref{firstimage}.

Video Recognition

Pedestrian Attribute Recognition via CLIP based Prompt Vision-Language Fusion

2 code implementations17 Dec 2023 Xiao Wang, Jiandong Jin, Chenglong Li, Jin Tang, Cheng Zhang, Wei Wang

In this paper, we formulate PAR as a vision-language fusion problem and fully exploit the relations between pedestrian images and attribute labels.

Attribute Contrastive Learning +2

Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception

1 code implementation15 Dec 2023 Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang

To address this issue, we propose a novel vehicle-centric pre-training framework called VehicleMAE, which incorporates the structural information including the spatial structure from vehicle profile information and the semantic structure from informative high-level natural language descriptions for effective masked vehicle appearance reconstruction.

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

1 code implementation15 Dec 2023 Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, ShiLiang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks.

Language Modelling Mixture-of-Experts +2

A Generalized Neural Diffusion Framework on Graphs

no code implementations14 Dec 2023 Yibo Li, Xiao Wang, Hongrui Liu, Chuan Shi

In this paper, we propose a general diffusion equation framework with the fidelity term, which formally establishes the relationship between the diffusion process with more GNNs.

SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm

2 code implementations4 Dec 2023 Jiandong Jin, Xiao Wang, Chenglong Li, Lili Huang, Jin Tang

Then, a Transformer decoder is proposed to generate the human attributes by incorporating the visual features and attribute query tokens.

Attribute Decoder +2

RTQ: Rethinking Video-language Understanding Based on Image-text Model

2 code implementations1 Dec 2023 Xiao Wang, Yaoyu Li, Tian Gan, Zheng Zhang, Jingjing Lv, Liqiang Nie

Recent advancements in video-language understanding have been established on the foundation of image-text models, resulting in promising outcomes due to the shared knowledge between images and videos.

Ranked #9 on Video Captioning on MSR-VTT (using extra training data)

Video Captioning Video Question Answering +1

Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models

1 code implementation30 Nov 2023 Dong Li, Jiandong Jin, Yuhao Zhang, Yanlin Zhong, Yaoyang Wu, Lan Chen, Xiao Wang, Bin Luo

Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition.

Language Modelling Prompt Engineering

Ultra-Long Sequence Distributed Transformer

no code implementations4 Nov 2023 Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley

This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences.

Orthogonal Subspace Learning for Language Model Continual Learning

1 code implementation22 Oct 2023 Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi Zhang, Tao Gui, Xuanjing Huang

In this paper, we propose orthogonal low-rank adaptation (O-LoRA), a simple and efficient approach for continual learning in language models, effectively mitigating catastrophic forgetting while learning new tasks.

Continual Learning Language Modeling +2

VcT: Visual change Transformer for Remote Sensing Image Change Detection

1 code implementation17 Oct 2023 Bo Jiang, Zitian Wang, Xixi Wang, Ziyan Zhang, Lan Chen, Xiao Wang, Bin Luo

Then, each pixel of feature map is regarded as a graph node and the graph neural network is proposed to model the structured information for coarse change map prediction.

Change Detection Graph Neural Network +1

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

no code implementations6 Oct 2023 Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences.

scientific discovery

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

no code implementations4 Oct 2023 Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin

This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.

Safety Alignment

Provable Training for Graph Contrastive Learning

1 code implementation NeurIPS 2023 Yue Yu, Xiao Wang, Mengmei Zhang, Nian Liu, Chuan Shi

To this end, we propose the PrOvable Training (POT) for GCL, which regularizes the training of GCL to encode node embeddings that follows the GCL principle better.

Contrastive Learning

A stochastic block model for community detection in attributed networks

no code implementations31 Aug 2023 Xiao Wang, Fang Dai, Wenyan Guo, Junfeng Wang

Therefore, a stochastic block model that integrates betweenness centrality and clustering coefficient of nodes for community detection in attributed networks, named BCSBM, is proposed in this paper.

Clustering Community Detection +1

Temporal Sentence Grounding in Streaming Videos

1 code implementation14 Aug 2023 Tian Gan, Xiao Wang, Yan Sun, Jianlong Wu, Qingpei Guo, Liqiang Nie

The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query.

Sentence Temporal Sentence Grounding

High-performance Data Management for Whole Slide Image Analysis in Digital Pathology

1 code implementation10 Aug 2023 Haoju Leng, Ruining Deng, Shunxing Bao, Dazheng Fang, Bryan A. Millis, Yucheng Tang, Haichun Yang, Xiao Wang, Yifan Peng, Lipeng Wan, Yuankai Huo

The performance evaluation encompasses two key scenarios: (1) a pure CPU-based image analysis scenario ("CPU scenario"), and (2) a GPU-based deep learning framework scenario ("GPU scenario").

Management whole slide images

SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition

1 code implementation8 Aug 2023 Xiao Wang, Yao Rong, Zongzhen Wu, Lin Zhu, Bo Jiang, Jin Tang, Yonghong Tian

Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition.

Generative Query Reformulation for Effective Adhoc Search

no code implementations1 Aug 2023 Xiao Wang, Sean MacAvaney, Craig Macdonald, Iadh Ounis

GenQR directly reformulates the user's input query, while GenPRF provides additional context for the query by making use of pseudo-relevance feedback information.

Information Retrieval Retrieval

DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization

1 code implementation27 Jun 2023 Songyang Gao, Shihan Dou, Yan Liu, Xiao Wang, Qi Zhang, Zhongyu Wei, Jin Ma, Ying Shan

Adversarial training is one of the best-performing methods in improving the robustness of deep language models.

Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition

1 code implementation8 Jun 2023 Bo Jiang, Chengguo Yuan, Xiao Wang, Zhimin Bao, Lin Zhu, Yonghong Tian, Jin Tang

To address these issues, we propose a novel dual point-voxel absorbing graph representation learning for event stream data representation.

Event data classification Graph Representation Learning

AMatFormer: Efficient Feature Matching via Anchor Matching Transformer

no code implementations30 May 2023 Bo Jiang, Shuxian Luo, Xiao Wang, Chuanfu Li, Jin Tang

Second, AMatFormer adopts a shared FFN module to further embed the features of two images into the common domain and thus learn the consensus feature representations for the matching problem.

A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition

1 code implementation21 May 2023 Limao Xiong, Jie zhou, Qunxi Zhu, Xiao Wang, Yuanbin Wu, Qi Zhang, Tao Gui, Xuanjing Huang, Jin Ma, Ying Shan

Particularly, we propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.

named-entity-recognition Named Entity Recognition +2

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

no code implementations8 May 2023 Yupei Lin, Sen Zhang, Xiaojun Yang, Xiao Wang, Yukai Shi

To ensure consistent preservation of the shape during image editing, we propose cross-attention guidance based on regeneration learning.

Hierarchical Contrastive Learning Enhanced Heterogeneous Graph Neural Network

no code implementations24 Apr 2023 Nian Liu, Xiao Wang, Hui Han, Chuan Shi

Specifically, two views of a HIN (network schema and meta-path views) are proposed to learn node embeddings, so as to capture both of local and high-order structures simultaneously.

Contrastive Learning Graph Neural Network

Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition

1 code implementation20 Apr 2023 Jun Zhu, Jiandong Jin, Zihan Yang, Xiaohao Wu, Xiao Wang

The averaged visual tokens and text tokens are concatenated and fed into a fusion Transformer for multi-modal interactive learning.

Attribute Pedestrian Attribute Recognition +1

Curricular Object Manipulation in LiDAR-based Object Detection

1 code implementation CVPR 2023 Ziyue Zhu, Qiang Meng, Xiao Wang, Ke Wang, Liujiang Yan, Jian Yang

For the loss design, we propose the COMLoss to dynamically predict object-level difficulties and emphasize objects of different difficulties based on training stages.

3D Object Detection Object +1

Efficient Multimodal Sampling via Tempered Distribution Flow

1 code implementation8 Apr 2023 Yixuan Qiu, Xiao Wang

Sampling from high-dimensional distributions is a fundamental problem in statistical research and practice.

Image Generation

RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning

no code implementations26 Mar 2023 Yabin Zhu, Chenglong Li, Xiao Wang, Jin Tang, Zhixiang Huang

In addition, existing learning methods of RGBT trackers either fuse multimodal features into one for final classification, or exploit the relationship between unimodal branches and fused branch through a competitive learning strategy.

Micro-video Tagging via Jointly Modeling Social Influence and Tag Relation

1 code implementation15 Mar 2023 Xiao Wang, Tian Gan, Yinwei Wei, Jianlong Wu, Dai Meng, Liqiang Nie

Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation.

Link Prediction Relation +3

AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+

1 code implementation14 Mar 2023 Xiao Wang, Ying Wang, Ziwei Xuan, Guo-Jun Qi

A criterion in unsupervised pretraining is the pretext task needs to be sufficiently hard to prevent the transformer encoder from learning trivial low-level features not generalizable well to downstream tasks.

Transfer Learning

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

1 code implementation20 Feb 2023 Xiao Wang, Guangyao Chen, Guangwu Qian, Pengcheng Gao, Xiao-Yong Wei, YaoWei Wang, Yonghong Tian, Wen Gao

We also give visualization and analysis of the model parameters and results on representative downstream tasks.

Survey

A Survey on Spectral Graph Neural Networks

no code implementations11 Feb 2023 Deyu Bo, Xiao Wang, Yang Liu, Yuan Fang, Yawen Li, Chuan Shi

Graph neural networks (GNNs) have attracted considerable attention from the research community.

Graph Representation Learning Survey

Machine Learning for Synthetic Data Generation: A Review

no code implementations8 Feb 2023 Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Tianfan Fu, Wenqi Wei

In light of these challenges, the concept of synthetic data generation emerges as a promising alternative that allows for data sharing and utilization in ways that real-world data cannot facilitate.

Fairness Synthetic Data Generation

DEJA VU: Continual Model Generalization For Unseen Domains

2 code implementations25 Jan 2023 Chenxi Liu, Lixu Wang, Lingjuan Lyu, Chen Sun, Xiao Wang, Qi Zhu

To overcome these limitations of DA and DG in handling the Unfamiliar Period during continual domain shift, we propose RaTP, a framework that focuses on improving models' target domain generalization (TDG) capability, while also achieving effective target domain adaptation (TDA) capability right after training on certain domains and forgetting alleviation (FA) capability on past domains.

Data Augmentation Domain Generalization +1

Directed Acyclic Graph Structure Learning from Dynamic Graphs

1 code implementation30 Nov 2022 Shaohua Fan, Shuyang Zhang, Xiao Wang, Chuan Shi

In a dynamic graph, we propose to simultaneously estimate contemporaneous relationships and time-lagged interaction relationships between the node features.

Graph structure learning

Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric

2 code implementations20 Nov 2022 Chuanming Tang, Xiao Wang, Ju Huang, Bo Jiang, Lin Zhu, Jianlin Zhang, YaoWei Wang, Yonghong Tian

In this paper, we propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.

Object Localization Object Tracking

Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach

no code implementations19 Nov 2022 Xixi Wang, Bo Jiang, Xiao Wang, Bin Luo

(1) It employs a flexible graph model, termed Batch Graph to jointly encode the visual and semantic relationships of samples within each mini-batch.

Metric Learning

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

3 code implementations17 Nov 2022 Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li, YaoWei Wang, Yonghong Tian

The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption.

Activity Prediction Human Activity Recognition +1

Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes

no code implementations19 Oct 2022 Niklas Kochdumper, Hanna Krasowski, Xiao Wang, Stanley Bak, Matthias Althoff

While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems.

reinforcement-learning Reinforcement Learning +2

Uncovering the Structural Fairness in Graph Contrastive Learning

1 code implementation6 Oct 2022 Ruijia Wang, Xiao Wang, Chuan Shi, Le Song

Recent studies show that graph convolutional network (GCN) often performs worse for low-degree nodes, exhibiting the so-called structural unfairness for graphs with long-tailed degree distributions prevalent in the real world.

Contrastive Learning Fairness

Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum

1 code implementation5 Oct 2022 Nian Liu, Xiao Wang, Deyu Bo, Chuan Shi, Jian Pei

Then we theoretically prove that GCL is able to learn the invariance information by contrastive invariance theorem, together with our GAME rule, for the first time, we uncover that the learned representations by GCL essentially encode the low-frequency information, which explains why GCL works.

Contrastive Learning

Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure

1 code implementation28 Sep 2022 Shaohua Fan, Xiao Wang, Yanhu Mo, Chuan Shi, Jian Tang

However, by presenting a graph classification investigation on the training graphs with severe bias, surprisingly, we discover that GNNs always tend to explore the spurious correlations to make decision, even if the causal correlation always exists.

counterfactual Graph Classification

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

1 code implementation18 Aug 2022 Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, Xiao Wang

To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM).

Person Retrieval Retrieval +4

Cannot find the paper you are looking for? You can Submit a new open access paper.