Search Results for author: Shuo Wang

Found 276 papers, 124 papers with code

Dual Adversarial Network for Deep Active Learning

no code implementations ECCV 2020 Shuo Wang, Yuexiang Li, Kai Ma, Ruhui Ma, Haibing Guan, Yefeng Zheng

In this paper, we investigate the overlapping problem of recent uncertainty-based approaches and propose to alleviate the issue by taking representativeness into consideration.

Active Learning

Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery

no code implementations ECCV 2020 Shuo Wang, Jun Yue, Jianzhuang Liu, Qi Tian, Meng Wang

It is a challenging problem since (1) the identifying process is susceptible to over-fitting with limited samples of an object, and (2) the sample imbalance between a base (known knowledge) category and a novel category is easy to bias the recognition results.

Few-Shot Learning

Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition

no code implementations ECCV 2020 Xiaobo Wang, Tianyu Fu, Shengcai Liao, Shuo Wang, Zhen Lei, Tao Mei

Knowledge distillation is an effective tool to compress large pre-trained Convolutional Neural Networks (CNNs) or their ensembles into models applicable to mobile and embedded devices.

Diversity Face Recognition +2

SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

no code implementations17 Jun 2025 Hexian Ni, Tao Lu, Haoyuan Hu, Yinghao Cai, Shuo Wang

In this paper, we present a novel efficient query selection and preference-guided exploration method, called SENIOR, which could select the meaningful and easy-to-comparison behavior segment pairs to improve human feedback-efficiency and accelerate policy learning with the designed preference-guided intrinsic rewards.

Density Estimation Robot Manipulation

Restoring Gaussian Blurred Face Images for Deanonymization Attacks

no code implementations14 Jun 2025 Haoyu Zhai, Shuo Wang, Pirouz Naghavi, Qingying Hao, Gang Wang

The key intuition is to leverage a generative model's memorization effect and approximate the inverse function of Gaussian blur for face restoration.

Deblurring Face Anonymization +1

ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization

1 code implementation12 Jun 2025 Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, Ge Yu

Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of Large Language Models (LLMs).

Math

MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment

no code implementations12 Jun 2025 Shuo Wang, Jihao Zhang

Segment importance, location, and center-ness are predicted, followed by key shot selection using Non-Maximum Suppression (NMS) and the Kernel Temporal Segmentation (KTS) algorithm.

Video Summarization

Trajectory Optimization for UAV-Based Medical Delivery with Temporal Logic Constraints and Convex Feasible Set Collision Avoidance

no code implementations6 Jun 2025 Kaiyuan Chen, Yuhan Suo, Shaowei Cui, Yuanqing Xia, Wannian Liang, Shuo Wang

This paper addresses the problem of trajectory optimization for unmanned aerial vehicles (UAVs) performing time-sensitive medical deliveries in urban environments.

Collision Avoidance

A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

1 code implementation30 May 2025 Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He

Specifically, A*-Thought can improve the performance of QwQ-32B by 2. 39$\times$ with low-budget and reduce the length of the output token by nearly 50% with high-budget.

Math

AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage

1 code implementation27 May 2025 Xuanle Zhao, Zilin Sang, YuXuan Li, Qi Shi, Weilun Zhao, Shuo Wang, Duzhen Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

Building on this idea, we propose AutoReproduce, a multi-agent framework capable of automatically reproducing experiments described in research papers in an end-to-end manner.

PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints

2 code implementations26 May 2025 Shuo Wang, Yun Cheng, Qingye Meng, Olga Saukh, Jiang Zhang, Jingfang Fan, YuanTing Zhang, Xingyuan Yuan, Lothar Thiele

Air quality forecasting (AQF) is critical for public health and environmental management, yet remains challenging due to the complex interplay of emissions, meteorology, and chemical transformations.

Deep Learning

Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning

no code implementations26 May 2025 Xiaorong Wang, Ting Yang, Zhu Zhang, Shuo Wang, Zihan Zhou, Liner Yang, Zhiyuan Liu, Maosong Sun

Moreover, we introduce a hybrid in-context learning approach that leverages human annotations to enhance the performance of both local and global evaluations.

Active Learning In-Context Learning +1

Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs

no code implementations22 May 2025 Zeyu Wei, Shuo Wang, Xiaohui Rong, Xuemin Liu, He Li

Hallucinations -- plausible yet erroneous outputs -- remain a critical barrier to reliable deployment of large language models (LLMs).

Hallucination TruthfulQA

From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora

no code implementations20 May 2025 Yingli Shen, Wen Lai, Shuo Wang, Kangyang Luo, Alexander Fraser, Maosong Sun

Continued pretraining and instruction tuning on large-scale multilingual data have proven to be effective in scaling large language models (LLMs) to low-resource languages.

Patient-Specific Dynamic Digital-Physical Twin for Coronary Intervention Training: An Integrated Mixed Reality Approach

no code implementations16 May 2025 Shuo Wang, Tong Ren, Nan Cheng, Rong Wang, Li Zhang

We developed cardiac output analysis and virtual angiography systems, implemented guidewire 3D reconstruction using binocular stereo vision, and evaluated the system through angiography validation and CABG training applications.

3D Reconstruction Mixed Reality

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

no code implementations14 May 2025 Yili He, Yan Zhu, Peiyao Fu, Ruijie Yang, Tianyi Chen, Zhihua Wang, QuanLin Li, Pinghong Zhou, Xian Yang, Shuo Wang

Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis, but faces challenges including non-informative background images, complex medical terminology, and ambiguous multi-lesion descriptions.

Contrastive Learning

Avoid Recommending Out-of-Domain Items: Constrained Generative Recommendation with LLMs

1 code implementation6 May 2025 Hao Liao, Wensheng Lu, Jianxun Lian, Mingqi Wu, Shuo Wang, Yong Zhang, Yitian Huang, Mingyang Zhou, Xing Xie

Large Language Models (LLMs) have shown promise for generative recommender systems due to their transformative capabilities in user interaction.

Recommendation Systems

DiffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition

no code implementations25 Apr 2025 Hanrui Wang, Shuo Wang, Chun-Shien Lu, Isao Echizen

It surpasses state-of-the-art attacks by 15. 5% and 9. 82% in success rate on standard and privacy-preserving face recognition systems, respectively.

Face Generation Face Recognition +2

RT-DATR:Real-time Unsupervised Domain Adaptive Detection Transformer with Adversarial Feature Learning

1 code implementation12 Apr 2025 Feng Lv, Chunlong Xia, Shuo Wang, Huo Cao

Building on RT-DETR as our base detector, we first introduce a local object-level feature alignment module to significantly enhance the feature representation of domain invariance during object transfer.

Domain Generalization Real-Time Object Detection +1

Improving Harmful Text Detection with Joint Retrieval and External Knowledge

no code implementations3 Apr 2025 Zidong Yu, Shuo Wang, Nan Jiang, Weiqiang Huang, Xu Han, Junliang Du

Harmful text detection has become a crucial task in the development and deployment of large language models, especially as AI-generated content continues to expand across digital platforms.

Computational Efficiency Knowledge Graphs +2

Benchmarking Federated Machine Unlearning methods for Tabular Data

no code implementations1 Apr 2025 Chenguang Xiao, Abhirup Ghosh, Han Wu, Shuo Wang, Diederick van Thiel

Machine unlearning, which enables a model to forget specific data upon request, is increasingly relevant in the era of privacy-centric machine learning, particularly within federated learning (FL) environments.

Benchmarking Computational Efficiency +3

Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors

1 code implementation28 Mar 2025 Zhiyu Yang, Shuo Wang, Yukun Yan, Yang Deng

To address this gap, we introduce DSDBench: the Data Science Debugging Benchmark, the first benchmark for systematic evaluation of LLMs on multi-hop error tracing and multi-bug detection in data science code debugging.

Benchmarking Code Generation +1

Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation

1 code implementation CVPR 2025 Zhiwei Yang, Yucong Meng, Kexue Fu, Feilong Tang, Shuo Wang, Zhijian Song

To mine fine-grained knowledge from visual features, our VC module first proposes Static Visual Calibration (SVC) to propagate fine-grained knowledge in a non-parametric manner.

Attribute Weakly supervised Semantic Segmentation +1

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

1 code implementation13 Mar 2025 Fengxiang Wang, Hongzhen Wang, Yulin Wang, Di Wang, Mingshuo Chen, Haiyan Zhao, Yangang Sun, Shuo Wang, Long Lan, Wenjing Yang, Jing Zhang

Recent advances in self-supervised learning for Vision Transformers (ViTs) have fueled breakthroughs in remote sensing (RS) foundation models.

Computational Efficiency Mamba +5

From Pixels to Trajectory: Universal Adversarial Example Detection via Temporal Imprints

no code implementations6 Mar 2025 Yansong Gao, Huaibing Peng, Hua Ma, Zhiyang Dai, Shuo Wang, Hongsheng Hu, Anmin Fu, Minhui Xue

Recognizing the distinct nature of loss between adversarial and clean examples, we exploit this temporal imprint for AE detection by proposing TRAIT (TRaceable Adversarial temporal trajectory ImprinTs).

One-Class Classification

Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge

1 code implementation5 Mar 2025 Fanwen Wang, Zi Wang, Yan Li, Jun Lyu, Chen Qin, Shuo Wang, Kunyuan Guo, Mengting Sun, Mingkai Huang, Haoyu Zhang, Michael Tänzer, Qirong Li, Xinran Chen, Jiahao Huang, Yinzhe Wu, Kian Anvari Hamedani, Yuntong Lyu, Longyu Sun, Qing Li, Ziqiang Xu, Bingyu Xin, Dimitris N. Metaxas, Narges Razizadeh, Shahabedin Nabavi, George Yiasemis, Jonas Teuwen, Zhenxi Zhang, Sha Wang, Chi Zhang, Daniel B. Ennis, Zhihao Xue, Chenxi Hu, Ruru Xu, Ilkay Oksuz, Donghang Lyu, Yanxin Huang, Xinrui Guo, Ruqian Hao, Jaykumar H. Patel, Guanke Cai, Binghua Chen, Yajing Zhang, Sha Hua, Zhensen Chen, Qi Dou, Xiahai Zhuang, Qian Tao, Wenjia Bai, Jing Qin, He Wang, Claudia Prieto, Michael Markl, Alistair Young, Hao Li, Xihong Hu, Lianmin Wu, Xiaobo Qu, Guang Yang, Chengyan Wang

In addition, through a detailed analysis of the results submitted to the challenge, we have also made several findings, including: 1) adaptive prompt-learning embedding is an effective means for achieving strong generalization in reconstruction models; 2) enhanced data consistency based on physics-informed networks is also an effective pathway toward a universal model; 3) traditional evaluation metrics have limitations when assessing ground-truth references with moderate or lower image quality, highlighting the need for subjective evaluation methods.

Benchmarking Image Reconstruction +3

Time-Varying Coronary Artery Deformation: A Dynamic Skinning Framework for Surgical Training

1 code implementation4 Mar 2025 Shuo Wang, Tong Ren, Nan Cheng, Rong Wang, Li Zhang

Purpose: This study proposes a novel anatomically-driven dynamic modeling framework for coronary arteries using skeletal skinning weights computation, aiming to achieve precise control over vessel deformation while maintaining real-time performance for surgical simulation applications.

Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models

no code implementations25 Feb 2025 JIA YU, Yan Zhu, Peiyao Fu, Tianyi Chen, Junbo Huang, QuanLin Li, Pinghong Zhou, Zhihua Wang, Fei Wu, Shuo Wang, Xian Yang

Diffusion models have emerged as a promising solution for generating synthetic polyp images, but the image generation process in current models mainly relies on segmentation masks as the condition, limiting their ability to capture the full clinical context.

Data Augmentation Image Generation +1

Exploring the Impact of Personality Traits on LLM Bias and Toxicity

no code implementations18 Feb 2025 Shuo Wang, Renhao Li, Xi Chen, Yulin Yuan, Derek F. Wong, Min Yang

The findings demonstrate the sensitivity of all three models to HEXACO personality traits and, more importantly, a consistent variation in the biases, negative sentiment and toxicity of their output.

Text Generation

CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection

1 code implementation17 Feb 2025 Zhe Huang, Shuo Wang, Yongcai Wang, Lei Wang

Experimental study on both simulated and real-world datasets demonstrates that the proposed framework CoDiff consistently outperforms existing relevant methods in terms of the collaborative object detection performance, and exhibits highly desired robustness when the pose and delay information of agents is with high-level noise.

3D Object Detection Autonomous Driving +2

DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection

1 code implementation17 Feb 2025 Yingli Shen, Wen Lai, Shuo Wang, Xueren Zhang, Kangyang Luo, Alexander Fraser, Maosong Sun

The rapid development of multilingual large language models (LLMs) highlights the need for high-quality, diverse, and clean multilingual datasets.

Anomaly Detection

Vertical Federated Continual Learning via Evolving Prototype Knowledge

no code implementations13 Feb 2025 Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu, Qi Wu

Vertical Federated Learning (VFL) has garnered significant attention as a privacy-preserving machine learning framework for sample-aligned feature federation.

Continual Learning Model Optimization +2

DAMO: Data- and Model-aware Alignment of Multi-modal LLMs

1 code implementation4 Feb 2025 Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, Yifan Zhang, Junfeng Fang, Xiang Wang, Xiangnan He

Direct Preference Optimization (DPO) has shown effectiveness in aligning multi-modal large language models (MLLM) with human preferences.

Hallucination

Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment

no code implementations4 Feb 2025 Shuo Wang, Bokui Wang, Zhixiang Shen, Boyan Deng, Zhao Kang

To address these issues, we propose the Multi-Domain Graph Foundation Model (MDGFM), a unified framework that aligns and leverages cross-domain topological information to facilitate robust knowledge transfer.

Domain Generalization Transfer Learning

Process Reinforcement through Implicit Rewards

5 code implementations3 Feb 2025 Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan YAO, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, BoWen Zhou, Ning Ding

While dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards, such as training efficiency and credit assignment, this potential remains largely unrealized.

Math Reinforcement Learning (RL)

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization

no code implementations1 Feb 2025 Jiangyong Yu, Sifan Zhou, Dawei Yang, Shuo Wang, Shuoyu Li, Xing Hu, Chen Xu, Zukang Xu, Changyong Shu, Zhihang Yuan

In this paper, we propose MQuant, a post-training quantization (PTQ) framework designed to tackle the unique challenges of multimodal large language models (MLLMs).

Quantization

Accelerating Diffusion Transformer via Error-Optimized Cache

no code implementations31 Jan 2025 Junxiang Qiu, Shuo Wang, Jinda Lu, Lin Liu, Houcheng Jiang, Yanbin Hao

Existing caching methods accelerate generation by reusing DiT features from the previous time step and skipping calculations in the next, but they tend to locate and cache low-error modules without focusing on reducing caching-induced errors, resulting in a sharp decline in generated content quality when increasing caching intensity.

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

1 code implementation11 Jan 2025 Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun

: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data.

Chart Understanding Code Generation +4

CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness

no code implementations9 Jan 2025 Shoucheng Song, Youfang Lin, Sheng Han, Chang Yao, Hao Wu, Shuo Wang, Kai Lv

This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration.

MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing

no code implementations CVPR 2025 Shuo Wang, Wanting Li, Yongcai Wang, Zhaoxin Fan, Zhe Huang, Xudong Cai, Jian Zhao, Deying Li

To address this challenge, this paper proposes MambaVO, which conducts robust initialization, Mamba-based sequential matching refinement, and smoothed training to enhance the matching quality and improve the pose estimation in deep visual odometry.

GPU Mamba +2

Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models

no code implementations27 Dec 2024 Shuo Wang, Chihang Wang, Jia Gao, Zhen Qi, Hongye Zheng, Xiaoxuan Liao

This study proposes a knowledge distillation algorithm based on large language models and feature alignment, aiming to effectively transfer the knowledge of large pre-trained models into lightweight student models, thereby reducing computational costs while maintaining high model performance.

Knowledge Distillation Model Compression +2

Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images

no code implementations27 Dec 2024 Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting Li, Deying Li, Lun Luo, Minhang Wang, Jintao Xu

To refine the 3D model at novel viewpoints, we propose a Confidence Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning their confident parts with estimated depths by a Mono-depth model.

3DGS Novel View Synthesis +1

Linguistics-Vision Monotonic Consistent Network for Sign Language Production

no code implementations22 Dec 2024 Xu Wang, Shengeng Tang, Peipei Song, Shuo Wang, Dan Guo, Richang Hong

Sign Language Production (SLP) aims to generate sign videos corresponding to spoken language sentences, where the conversion of sign Glosses to Poses (G2P) is the key step.

Sign Language Production

FedGA: Federated Learning with Gradient Alignment for Error Asymmetry Mitigation

no code implementations21 Dec 2024 Chenguang Xiao, Zheming Zuo, Shuo Wang

Federated learning (FL) triggers intra-client and inter-client class imbalance, with the latter compared to the former leading to biased client updates and thus deteriorating the distributed models.

Federated Learning

Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory

no code implementations16 Dec 2024 Shuo Wang, Issei Sato

In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks without fine-tuning by leveraging contextual information provided within a prompt.

In-Context Learning

MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation

1 code implementation15 Dec 2024 Zhiwei Yang, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song

To this end, we first view the attention as a novel directed graph and propose the Graph Category Representation module to implicitly regularize the interaction among class-patch entities.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Medical Manifestation-Aware De-Identification

1 code implementation14 Dec 2024 Yuan Tian, Shuo Wang, Guangtao Zhai

Face de-identification (DeID) has been widely studied for common scenes, but remains under-researched for medical scenes, mostly due to the lack of large-scale patient face datasets.

De-identification

Parameter Estimation based Automatic Modulation Recognition for Radio Frequency Signal

no code implementations11 Dec 2024 Shuo Wang, Kuojun Yang, Zelin Ji, Qinchuan Zhang, Huiqing Pan

However, current automatic identification methods require the input of key parameters such as the carrier frequency, which is necessary to convert the radio frequency (RF) to a base-band signal before it can be used for identification.

Automatic Modulation Recognition parameter estimation

Optimizing Multi-Task Learning for Enhanced Performance in Large Language Models

no code implementations9 Dec 2024 Zhen Qi, Jiajing Chen, Shuo Wang, Bingying Liu, Hongye Zheng, Chihang Wang

This study aims to explore the performance improvement method of large language models based on GPT-4 under the multi-task learning framework and conducts experiments on two tasks: text classification and automatic summary generation.

Multi-Task Learning text-classification +1

World-Consistent Data Generation for Vision-and-Language Navigation

no code implementations9 Dec 2024 Yu Zhong, Rui Zhang, Zihao Zhang, Shuo Wang, Chuan Fang, Xishan Zhang, Jiaming Guo, Shaohui Peng, Di Huang, Yanyang Yan, Xing Hu, Ping Tan, Qi Guo

Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions.

Data Augmentation Navigate +1

Rethinking the initialization of Momentum in Federated Learning with Heterogeneous Data

no code implementations29 Nov 2024 Chenguang Xiao, Shuo Wang

However, we spot a problem in the traditional cumulation of the momentum which is suboptimal in the Federated Learning systems.

Federated Learning

KBAlign: Efficient Self Adaptation on Specific Knowledge Bases

1 code implementation22 Nov 2024 Zheni Zeng, Yuxuan Chen, Shi Yu, Ruobing Wang, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun

Although retrieval-augmented generation (RAG) remains essential for knowledge-based question answering (KBQA), current paradigms face critical challenges under specific domains.

Question Answering RAG +2

A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation

no code implementations19 Nov 2024 Jiajing Chen, Shuo Wang, Zhen Qi, Zhenhong Zhang, Chihang Wang, Hongye Zheng

This research introduces a novel text generation model that combines BERT's semantic interpretation strengths with GPT-4's generative capabilities, establishing a high standard in generating coherent, contextually accurate language.

Text Generation

Transfer Learning Guided Noise Reduction for Automatic Modulation Classification

no code implementations13 Nov 2024 Zelin Ji, Shuo Wang, Kuojun Yang, Qinchuan Zhang, Peng Ye

The numerical results show that the proposed noise reduction network achieves an accuracy improvement of over 20\% in low SNR scenarios, and the TNR-AMC framework can improve the classification accuracy under unstable SNRs.

Classification Transfer Learning

Advanced RAG Models with Graph Structures: Optimizing Complex Knowledge Reasoning and Text Generation

no code implementations6 Nov 2024 Yuxin Dong, Shuo Wang, Hongye Zheng, Jiajing Chen, Zhenhong Zhang, Chihang Wang

This study proposes a scheme to process graph structure data by combining graph neural network (GNN), so that the model can capture the complex relationship between entities, thereby improving the knowledge consistency and reasoning ability of the generated text.

Graph Neural Network Knowledge Graphs +5

MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning

no code implementations30 Oct 2024 Xujia Wang, Haiyan Zhao, Shuo Wang, Hanqing Wang, Zhiyuan Liu

Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have significantly improved the adaptation of LLMs to downstream tasks in a resource-efficient manner.

Computational Efficiency Mixture-of-Experts +2

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting

no code implementations25 Oct 2024 Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang

Vision-language models, such as CLIP, have shown impressive generalization capacities when using appropriate text descriptions.

Building A Coding Assistant via the Retrieval-Augmented Language Model

1 code implementation21 Oct 2024 Xinze Li, Hanbin Wang, Zhenghao Liu, Shi Yu, Shuo Wang, Yukun Yan, Yukai Fu, Yu Gu, Ge Yu

Specifically, it consists of a code structure aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G).

Code Completion Code Generation +5

Automated Genre-Aware Article Scoring and Feedback Using Large Language Models

no code implementations18 Oct 2024 Chihang Wang, Yuxin Dong, Zhenhong Zhang, Ruotong Wang, Shuo Wang, Jiajing Chen

This paper focuses on the development of an advanced intelligent article scoring system that not only assesses the overall quality of written work but also offers detailed feature-based scoring tailored to various article genres.

Language Modeling Language Modelling +1

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs

1 code implementation18 Oct 2024 Runchu Tian, Yanghao Li, Yuepeng Fu, Siyang Deng, Qinyu Luo, Cheng Qian, Shuo Wang, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Huadong Wang, Xiaojiang Liu

These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces.

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

1 code implementation17 Oct 2024 Xinze Li, Sen Mei, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Hao Chen, Ge Yu, Zhiyuan Liu, Maosong Sun, Chenyan Xiong

Our experiments on various knowledge-intensive tasks demonstrate that DDR significantly outperforms the SFT method, particularly for LLMs with smaller-scale parameters that depend more on the retrieved knowledge.

RAG Retrieval +1

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

1 code implementation14 Oct 2024 Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun

In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.

RAG Retrieval +1

LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models

1 code implementation12 Oct 2024 Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Rongqiao An, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, Zhiyuan Liu, Maosong Sun

The proposed LLM$\times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.

document understanding

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

1 code implementation11 Oct 2024 Ruobing Wang, Daren Zha, Shi Yu, Qingfei Zhao, Yuxuan Chen, YiXuan Wang, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun

Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge.

Open-Domain Question Answering RAG +2

FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image Classification

1 code implementation29 Sep 2024 Kexue Fu, Xiaoyuan Luo, Linhao Qu, Shuo Wang, Ying Xiong, Ilias Maglogiannis, Longxiang Gao, Manning Wang

Unlike few-shot learning methods in natural images that can leverage the labels of each image, existing few-shot WSI classification methods only utilize a small number of fine-grained labels or weakly supervised slide labels for training in order to avoid expensive fine-grained annotation.

Classification Few-Shot Learning +3

PGN: The RNN's New Successor is Effective for Long-Range Time Series Forecasting

1 code implementation26 Sep 2024 Yuxin Jia, Youfang Lin, Jing Yu, Shuo Wang, Tianhao Liu, Huaiyu Wan

The other branch employs patches to capture short-term information and aggregate the global representation of the series.

Time Series Time Series Forecasting

Beyond Redundancy: Information-aware Unsupervised Multiplex Graph Structure Learning

1 code implementation25 Sep 2024 Zhixiang Shen, Shuo Wang, Zhao Kang

Moreover, existing methods primarily rely on contrastive learning to maximize mutual information across different graphs, limiting them to multiplex graph redundant scenarios and failing to capture view-unique task-relevant information.

Contrastive Learning Graph Learning +1

A Personalised 3D+t Mesh Generative Model for Unveiling Normal Heart Dynamics

1 code implementation20 Sep 2024 Mengyun Qiao, Kathryn A McGurk, Shuo Wang, Paul M. Matthews, Declan P O Regan, Wenjia Bai

To this end, we developed a novel conditional generative model, MeshHeart, to learn the distribution of cardiac shape and motion patterns.

Differentiable Collision-Supervised Tooth Arrangement Network with a Decoupling Perspective

no code implementations18 Sep 2024 Zhihui He, Chengyuan Wang, Shidong Yang, Li Chen, Yanheng Zhou, Shuo Wang

Therefore, we propose DTAN, a differentiable collision-supervised tooth arrangement network, decoupling predicting tasks and feature modeling.

RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision

1 code implementation13 Sep 2024 Shuo Wang, Chunlong Xia, Feng Lv, Yifeng Shi

However, compared to dense supervision detectors like the YOLO series, the Hungarian matching provides much sparser supervision, leading to insufficient model training and difficult to achieve optimal results.

Decoder object-detection +1

GAZEploit: Remote Keystroke Inference Attack by Gaze Estimation from Avatar Views in VR/MR Devices

no code implementations12 Sep 2024 Hanqiu Wang, Zihao Zhan, Haoqi Shan, Siqi Dai, Max Panoff, Shuo Wang

The advent and growing popularity of Virtual Reality (VR) and Mixed Reality (MR) solutions have revolutionized the way we interact with digital platforms.

Gaze Estimation Inference Attack +1

Low carbon optimal scheduling of integrated energy system considering waste heat utilization under the coordinated operation of incineration power plant and P2G

no code implementations11 Sep 2024 Limeng Wang, Shuo Wang, Na Wang, Yuze Ma, Yang Li

In order to improve energy utilization and reduce carbon emissions, this paper presents a comprehensive energy system economic operation strategy of Incineration power plant Power-to-gas (P2G) with waste heat recovery.

FLUE Scheduling

Configurable Foundation Models: Building LLMs from a Modular Perspective

no code implementations4 Sep 2024 Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, GuanYu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs.

Computational Efficiency Mixture-of-Experts

Video-based Analysis Reveals Atypical Social Gaze in People with Autism Spectrum Disorder

no code implementations1 Sep 2024 Xiangxu Yu, Mindi Ruan, Chuanbo Hu, Wenqi Li, Lynn K. Paul, Xin Li, Shuo Wang

In this study, we present a quantitative and comprehensive analysis of social gaze in people with autism spectrum disorder (ASD).

Diagnostic

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

no code implementations CVPR 2025 Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhucun Xue, Yong liu

For temporal modeling, the single-person-based methods concatenate two people into a single one directly, while the separate modeling-based methods skip the modeling of interaction sequences.

Motion Generation

GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

no code implementations17 Aug 2024 Shuo Wang, Yongcai Wang, Zhimin Xu, Yongyu Guo, Wanting Li, Zhe Huang, Xuewei Bai, Deying Li

GSLAMOT utilizes camera and LiDAR multimodal information as inputs and divides the representation of the dynamic scene into a semantic map for representing the static environment, a trajectory of the ego-agent, and an online maintained Tracklet Graph (TG) for tracking and predicting the 3D poses of the detected mobile objects.

Multiple Object Tracking Object +2

A Multi-task Adversarial Attack Against Face Authentication

1 code implementation15 Aug 2024 Hanrui Wang, Shuo Wang, Cunjian Chen, Massimo Tistarelli, Zhe Jin

In this paper, we propose a multi-task adversarial attack algorithm called MTADV that are adaptable for multiple users or systems.

Adversarial Attack Management

Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space

no code implementations14 Aug 2024 Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han

To further improve the training of extra UAS parameters, we introduce a Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the trajectory information.

SMAC+

COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis

1 code implementation9 Aug 2024 Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu

In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process.

Code Generation Code Repair

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

1 code implementation2 Aug 2024 Kunlun Zhu, Yifan Luo, Dingling Xu, Yukun Yan, Zhenghao Liu, Shi Yu, Ruobing Wang, Shuo Wang, Yishan Li, Nan Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

However, evaluating the effectiveness of RAG systems in specialized scenarios remains challenging due to the high costs of data construction and the lack of suitable evaluation metrics.

Benchmarking Dataset Generation +6

RoCo:Robust Collaborative Perception By Iterative Object Matching and Pose Adjustment

1 code implementation1 Aug 2024 Zhe Huang, Shuo Wang, Yongcai Wang, Wanting Li, Deying Li, Lei Wang

However, in collaborative perception, the quality of object detection based on a modality is highly sensitive to the relative pose errors among the agents.

Autonomous Driving Object +2

Selective Vision-Language Subspace Projection for Few-shot CLIP

1 code implementation24 Jul 2024 Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang

Vision-language models such as CLIP are capable of mapping the different modality data into a unified feature space, enabling zero/few-shot inference by measuring the similarity of given images and texts.

Few-Shot Learning

Towards Automated Data Sciences with Natural Language and SageCopilot: Practices and Lessons Learned

no code implementations21 Jul 2024 Yuan Liao, Jiang Bian, Yuhui Yun, Shuo Wang, Yubo Zhang, Jiaming Chu, Tao Wang, Kewei Li, Yuchen Li, Xuhong LI, Shilei Ji, Haoyi Xiong

While the field of NL2SQL has made significant advancements in translating natural language instructions into executable SQL scripts for data querying and processing, achieving full automation within the broader data science pipeline - encompassing data querying, analysis, visualization, and reporting - remains a complex challenge.

In-Context Learning

Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation

1 code implementation19 Jul 2024 Jinda Lu, Shuo Wang, Yanbin Hao, Haifeng Liu, Xiang Wang, Meng Wang

However, these adaptation methods are usually operated on the global view of an input image, and thus biased perception of partial local details of the image.

Transfer Learning

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

1 code implementation16 Jul 2024 Ouxiang Li, Yanbin Hao, Zhicai Wang, Bin Zhu, Shuo Wang, Zaixi Zhang, Fuli Feng

To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, we propose Diffusion-based Model Inversion (Diff-MI) attacks.

Image Reconstruction

EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis

no code implementations16 Jul 2024 Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, QuanLin Li, Pinghong Zhou, Xian Yang, Shuo Wang

To overcome this limitation, we introduce EndoFinder, a content-based image retrieval framework to find the 'digital twin' polyp in the reference database given a newly detected polyp.

Content-Based Image Retrieval Contrastive Learning +2

Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

no code implementations6 Jul 2024 Binhao Ma, Tianhang Zheng, Hongsheng Hu, Di Wang, Shuo Wang, Zhongjie Ba, Zhan Qin, Kui Ren

Our evaluation demonstrates that unlearning this benign data, comprising no more than 1% of the total training data, can reduce model accuracy by up to 50%.

Data Poisoning Machine Unlearning

An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

no code implementations18 Jun 2024 Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Yajing Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

We test state-of-the-art foundation models for medical image segmentation, including the original SAM, medical SAM and SAT models, to evaluate segmentation efficacy across different demographic groups and identify disparities.

Fairness Image Segmentation +3

On the Combination of AI and Wireless Technologies: 3GPP Standardization Progress

no code implementations17 Jun 2024 Chen Sun, Tao Cui, Wenqi Zhang, Yingshuang Bai, Shuo Wang, Haojin Li

Combing Artificial Intelligence (AI) and wireless communication technologies has become one of the major technologies trends towards 2030.

Management

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

1 code implementation13 Jun 2024 Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun

Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision.

Math Quantization

MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

1 code implementation13 Jun 2024 Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen

It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principal matrix contains important knowledge.

Math visual instruction following

Hierarchical Space-Time Attention for Micro-Expression Recognition

1 code implementation6 May 2024 Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang

Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas.

Micro Expression Recognition Micro-Expression Recognition +1

Exploiting ChatGPT for Diagnosing Autism-Associated Language Disorders and Identifying Distinct Features

1 code implementation3 May 2024 Chuanbo Hu, Wenqi Li, Mindi Ruan, Xiangxu Yu, Shalaka Deshpande, Lynn K. Paul, Shuo Wang, Xin Li

Diagnosing language disorders associated with autism is a complex challenge, often hampered by the subjective nature and variability of traditional assessment methods.

Diagnostic Language Modelling +3

Exploring Speech Pattern Disorders in Autism using Machine Learning

no code implementations3 May 2024 Chuanbo Hu, Jacob Thrasher, Wenqi Li, Mindi Ruan, Xiangxu Yu, Lynn K Paul, Shuo Wang, Xin Li

Diagnosing autism spectrum disorder (ASD) by identifying abnormal speech patterns from examiner-patient dialogues presents significant challenges due to the subtle and diverse manifestations of speech-related symptoms in affected individuals.

Diagnostic regression +1

The 8th AI City Challenge

no code implementations15 Apr 2024 Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities.

Dense Video Captioning

Boosting Few-Shot Learning via Attentive Feature Regularization

no code implementations23 Mar 2024 Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He

Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two samples from different categories with a blending factor.

Few-Shot Learning

Technical Report: Masked Skeleton Sequence Modeling for Learning Larval Zebrafish Behavior Latent Embeddings

no code implementations23 Mar 2024 Lanxin Xu, Shuo Wang

In this report, we introduce a novel self-supervised learning method for extracting latent embeddings from behaviors of larval zebrafish.

Self-Supervised Learning Sentence

SoK: Can Trajectory Generation Combine Privacy and Utility?

1 code implementation12 Mar 2024 Erik Buchholz, Alsharif Abuadbba, Shuo Wang, Surya Nepal, Salil S. Kanhere

This work focuses on the systematisation of the state-of-the-art generative models for trajectories in the context of the proposed framework.

Privacy Preserving

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

1 code implementation25 Feb 2024 Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu

It finetunes the compression plugin module and uses the representations of gist tokens to emulate the raw prompts in the vanilla language model.

Decoder Language Modeling +2

OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models

1 code implementation21 Feb 2024 Meng Xu, Shuo Wang, Liner Yang, Haoyu Wang, Zhenghao Liu, Cunliang Kong, Yun Chen, Yang Liu, Maosong Sun, Erhong Yang

We evaluate several representative multilingual LLMs on the proposed OMGEval, which we believe will provide a valuable reference for the community to further understand and improve the multilingual capability of LLMs.

General Knowledge Logical Reasoning

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

4 code implementations21 Feb 2024 Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, JunHao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun

Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction.

ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents

1 code implementation21 Feb 2024 Zhipeng Xu, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Chaojun Xiao, Zhiyuan Liu, Ge Yu, Chenyan Xiong

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to leverage external knowledge, enhancing their performance on knowledge-intensive tasks.

Active Learning Position +4

Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages

1 code implementation19 Feb 2024 Yuanchi Zhang, Yile Wang, Zijun Liu, Shuo Wang, Xiaolong Wang, Peng Li, Maosong Sun, Yang Liu

While large language models (LLMs) have been pre-trained on multilingual corpora, their performance still lags behind in most languages compared to a few resource-rich languages.

Transfer Learning

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

1 code implementation18 Feb 2024 Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns.

Code Generation Data Visualization

LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks

no code implementations18 Feb 2024 Hanqing Wang, Bowen Ping, Shuo Wang, Xu Han, Yun Chen, Zhiyuan Liu, Maosong Sun

Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights.

Math

OneBit: Towards Extremely Low-bit Large Language Models

1 code implementation17 Feb 2024 Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che

Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs.

Quantization

UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

1 code implementation7 Feb 2024 Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun

Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs.

Cross-Lingual Transfer Data Augmentation

Learning with Mixture of Prototypes for Out-of-Distribution Detection

1 code implementation5 Feb 2024 Haodong Lu, Dong Gong, Shuo Wang, Jason Xue, Lina Yao, Kristen Moore

To tackle these issues, we propose PrototypicAl Learning with a Mixture of prototypes (PALM) which models each class with multiple prototypes to capture the sample diversities, and learns more faithful and compact samples embeddings to enhance OOD detection.

Out-of-Distribution Detection Out of Distribution (OOD) Detection +1

LegalDuet: Learning Fine-grained Representations for Legal Judgment Prediction via a Dual-View Contrastive Learning

1 code implementation27 Jan 2024 Buqiang Xu, Xin Dai, Zhenghao Liu, Huiyuan Xie, Xiaoyuan Yi, Shuo Wang, Yukun Yan, Liner Yang, Yu Gu, Ge Yu

In this paper, we propose LegalDuet, which continuously pretrains language models to learn a more tailored embedding space for representing legal cases.

Contrastive Learning

Stream Query Denoising for Vectorized HD Map Construction

no code implementations17 Jan 2024 Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao

This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.

Autonomous Driving Denoising

Graph Elimination Networks

no code implementations2 Jan 2024 Shuo Wang, Ge Cheng, Yun Zhang

Graph Neural Networks (GNNs) are widely applied across various domains, yet they perform poorly in deep layers.

GraphGuard: Detecting and Counteracting Training Data Misuse in Graph Neural Networks

1 code implementation13 Dec 2023 Bang Wu, He Zhang, Xiangwen Yang, Shuo Wang, Minhui Xue, Shirui Pan, Xingliang Yuan

These limitations call for an effective and comprehensive solution that detects and mitigates data misuse without requiring exact training data while respecting the proprietary nature of such data.

Transferring Modality-Aware Pedestrian Attentive Learning for Visible-Infrared Person Re-identification

no code implementations12 Dec 2023 Yuwei Guo, WenHao Zhang, Licheng Jiao, Shuang Wang, Shuo Wang, Fang Liu

Visible-infrared person re-identification (VI-ReID) aims to search the same pedestrian of interest across visible and infrared modalities.

Data Augmentation Person Re-Identification

Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective

1 code implementation8 Dec 2023 Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang

Leveraging on the remarkable capabilities of foundation models (i. e., Llama2 and SAM), we propose to augment recipe and food image by extracting alignable information related to the counterpart.

Cross-Modal Retrieval Data Augmentation +2

Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

no code implementations6 Dec 2023 Xiaobo Hu, Youfang Lin, Hehe Fan, Shuo Wang, Zhihao Wu, Kai Lv

To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment.

Object Visual Navigation

A Reliable Representation with Bidirectional Transition Model for Visual Reinforcement Learning Generalization

no code implementations4 Dec 2023 Xiaobo Hu, Youfang Lin, Yue Liu, Jinwen Wang, Shuo Wang, Hehe Fan, Kai Lv

Visual reinforcement learning has proven effective in solving control tasks with high-dimensional observations.

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

1 code implementation16 Nov 2023 Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu

INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher.

Code Repair Code Translation

TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

1 code implementation6 Nov 2023 Shuo Wang, Jing Li, Zibo Zhao, Dongze Lian, Binbin Huang, Xiaomei Wang, Zhengxin Li, Shenghua Gao

Holistic scene understanding includes semantic segmentation, surface normal estimation, object boundary detection, depth estimation, etc.

Boundary Detection Depth Estimation +5

Understanding Parameter Saliency via Extreme Value Theory

no code implementations27 Oct 2023 Shuo Wang, Issei Sato

Furthermore, we show that the existing parameter saliency method exhibits a bias against the depth of layers in deep neural networks.

Anomaly Detection Saliency Ranking

Breaking of brightness consistency in optical flow with a lightweight CNN network

1 code implementation24 Oct 2023 Yicheng Lin, Shuo Wang, Yunlong Jiang, Bin Han

Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method.

CPU Optical Flow Estimation

VFedMH: Vertical Federated Learning for Training Multiple Heterogeneous Models

no code implementations20 Oct 2023 Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo, Bin Xiao

Then the passive party, who owns only features of the sample, injects the blinding factor into the local embedding and sends it to the active party.

Vertical Federated Learning

Action Recognition Utilizing YGAR Dataset

no code implementations2 Oct 2023 Shuo Wang, Amiya Ranjan, Lawrence Jiang

The scarcity of high quality actions video data is a bottleneck in the research and application of action recognition.

Action Recognition

Natural Language Models for Data Visualization Utilizing nvBench Dataset

no code implementations2 Oct 2023 Shuo Wang, Carlos Crespo-Quinones

Translation of natural language into syntactically correct commands for data visualization is an important application of natural language models and could be leveraged to many different tasks.

Data Visualization Natural Language Queries +1

RR-CP: Reliable-Region-Based Conformal Prediction for Trustworthy Medical Image Classification

no code implementations9 Sep 2023 Yizhe Zhang, Shuo Wang, Yejia Zhang, Danny Z. Chen

Conformal prediction (CP) generates a set of predictions for a given test sample such that the prediction set almost always contains the true label (e. g., 99. 5\% of the time).

Conformal Prediction Decision Making +4

Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf

1 code implementation9 Sep 2023 Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu

Communication games, which we refer to as incomplete information games that heavily depend on natural language communication, hold significant research value in fields such as economics, social science, and artificial intelligence.

Retrieval

SamDSK: Combining Segment Anything Model with Domain-Specific Knowledge for Semi-Supervised Learning in Medical Image Segmentation

1 code implementation26 Aug 2023 Yizhe Zhang, Tao Zhou, Shuo Wang, Ye Wu, Pengfei Gu, Danny Z. Chen

Our new method is iterative and consists of two main stages: (1) segmentation model training; (2) expanding the labeled set by using the trained segmentation model, an unlabeled set, SAM, and domain-specific knowledge.

Image Segmentation Lesion Segmentation +3

A Unified Query-based Paradigm for Camouflaged Instance Segmentation

1 code implementation14 Aug 2023 Bo Dong, Jialun Pei, Rongrong Gao, Tian-Zhu Xiang, Shuo Wang, Huan Xiong

Due to the high similarity between camouflaged instances and the background, the recently proposed camouflaged instance segmentation (CIS) faces challenges in accurate localization and instance segmentation.

Boundary Detection Decoder +4

Pluggable Neural Machine Translation Models via Memory-augmented Adapters

1 code implementation12 Jul 2023 Yuzhuang Xu, Shuo Wang, Peng Li, Xuebo Liu, Xiaolong Wang, Weidong Liu, Yang Liu

Although neural machine translation (NMT) models perform well in the general domain, it remains rather challenging to control their generation behavior to satisfy the requirement of different users.

Machine Translation NMT +1

Synthetic Demographic Data Generation for Card Fraud Detection Using GANs

1 code implementation29 Jun 2023 Shuo Wang, Terrence Tricco, Xianta Jiang, Charles Robertson, John Hawkin

This study can help improve the cognition of synthetic data and further explore the application of synthetic data generation in card fraud detection.

Fraud Detection Generative Adversarial Network +1

Segmentation and Tracking of Vegetable Plants by Exploiting Vegetable Shape Feature for Precision Spray of Agricultural Robots

1 code implementation23 Jun 2023 Nan Hu, Daobilige Su, Shuo Wang, Xuechang Wang, Huiyu Zhong, Zimeng Wang, Yongliang Qiao, Yu Tan

Regarding the robust tracking of vegetable plants, to solve the challenging problem of associating vegetables with similar color and texture in consecutive images, in this paper, a novel method of Multiple Object Tracking and Segmentation (MOTS) is proposed for instance segmentation and tracking of multiple vegetable plants.

Instance Segmentation Multiple Object Tracking +3

MCTS: A Multi-Reference Chinese Text Simplification Dataset

1 code implementation5 Jun 2023 Ruining Chong, Luming Lu, Liner Yang, Jinran Nie, Zhenghao Liu, Shuo Wang, Shuhan Zhou, Yaoxin Li, Erhong Yang

We hope to build a basic understanding of Chinese text simplification through the foundational work and provide references for future research.

Machine Translation Text Simplification

AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports

no code implementations23 Apr 2023 Hongyu Sun, Yongcai Wang, Xudong Cai, Peng Wang, Zhe Huang, Deying Li, Yu Shao, Shuo Wang

To advance the research and practical solutions for bird strike prevention, in this paper, we present a large-scale challenging dataset AirBirds that consists of 118, 312 time-series images, where a total of 409, 967 bounding boxes of flying birds are manually, carefully annotated.

Time Series

IDLS: Inverse Depth Line based Visual-Inertial SLAM

no code implementations23 Apr 2023 Wanting Li, Shuo Wang, Yongcai Wang, Yu Shao, Xuewei Bai, Deying Li

Using this compact line presentation, Inverse Depth Line SLAM (IDLS) is proposed to track the line features in SLAM in an accurate and efficient way.

Descriptive Visual Localization

The 7th AI City Challenge

no code implementations15 Apr 2023 Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa

The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential.

Retrieval

CamDiff: Camouflage Image Augmentation via Diffusion Model

1 code implementation11 Apr 2023 Xue-Jing Luo, Shuo Wang, Zongwei Wu, Christos Sakaridis, Yun Cheng, Deng-Ping Fan, Luc van Gool

Specifically, we leverage the latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure the synthesized object aligns with the input prompt.

Dataset Generation Image Augmentation +6

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

1 code implementation CVPR 2023 Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He

It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match.

Zero-Shot Learning

Toward a Geometric Theory of Manifold Untangling

no code implementations7 Mar 2023 Xin Li, Shuo Wang

It has been hypothesized that the ventral stream processing for object recognition is based on a mechanism called cortically local subspace untangling.

Object Object Recognition

Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

no code implementations CVPR 2023 Shuo Wang, Xinhai Zhao, Hai-Ming Xu, Zehui Chen, Dameng Yu, Jiahao Chang, Zhen Yang, Feng Zhao

Based on the covariate shift assumption, we find that the gap mainly attributes to the feature distribution of BEV, which is determined by the quality of both depth estimation and 2D image's feature representation.

3D Object Detection Depth Estimation +3

Memory-aided Contrastive Consensus Learning for Co-salient Object Detection

2 code implementations28 Feb 2023 Peng Zheng, Jie Qin, Shuo Wang, Tian-Zhu Xiang, Huan Xiong

To learn better group consensus, we propose the Group Consensus Aggregation Module (GCAM) to abstract the common features of each image group; meanwhile, to make the consensus representation more discriminative, we introduce the Memory-based Contrastive Module (MCM), which saves and updates the consensus of images from different groups in a queue of memories.

Co-Salient Object Detection object-detection +1

CHeart: A Conditional Spatio-Temporal Generative Model for Cardiac Anatomy

1 code implementation30 Jan 2023 Mengyun Qiao, Shuo Wang, Huaqi Qiu, Antonio de Marvao, Declan P. O'Regan, Daniel Rueckert, Wenjia Bai

Two key questions in cardiac image analysis are to assess the anatomy and motion of the heart from images; and to understand how they are associated with non-imaging clinical factors such as gender, age and diseases.

Anatomy Image Segmentation +1

Boosting Whole Slide Image Classification from the Perspectives of Distribution, Correlation and Magnification

no code implementations ICCV 2023 Linhao Qu, Zhiwei Yang, Minghong Duan, Yingfan Ma, Shuo Wang, Manning Wang, Zhijian Song

However, there are still three important issues that have not been fully addressed: (1) positive bags with a low positive instance ratio are prone to the influence of a large number of negative instances; (2) the correlation between local and global features of pathology images has not been fully modeled; and (3) there is a lack of effective information interaction between different magnifications.

image-classification Image Classification +1

EndoBoost: a plug-and-play module for false positive suppression during computer-aided polyp detection in real-world colonoscopy (with dataset)

no code implementations23 Dec 2022 Haoran Wang, Yan Zhu, Wenzheng Qin, Yizhe Zhang, Pinghong Zhou, QuanLin Li, Shuo Wang, Zhijian Song

In addition, the released dataset can be used to perform 'stress' tests on established detection systems and encourages further research toward robust and reliable computer-aided endoscopic image analysis.

Anomaly Detection Density Estimation

DeepTaster: Adversarial Perturbation-Based Fingerprinting to Identify Proprietary Dataset Use in Deep Neural Networks

no code implementations24 Nov 2022 Seonhye Park, Alsharif Abuadbba, Shuo Wang, Kristen Moore, Yansong Gao, Hyoungshick Kim, Surya Nepal

In this study, we introduce DeepTaster, a novel DNN fingerprinting technique, to address scenarios where a victim's data is unlawfully used to build a suspect model.

Data Augmentation Transfer Learning

Multitask Learning for Improved Late Mechanical Activation Detection of Heart from Cine DENSE MRI

no code implementations11 Nov 2022 Jiarui Xing, Shuo Wang, Kenneth C. Bilchick, Frederick H. Epstein, Amit R. Patel, Miaomiao Zhang

With a newly introduced auxiliary LMA region classification sub-network, our proposed model shows more robustness to the complex pattern cause by myocardial scar, significantly eliminates their negative effects in LMA detection, and in turn improves the performance of scar classification.

Joint Deep Learning for Improved Myocardial Scar Detection from Cardiac MRI

no code implementations11 Nov 2022 Jiarui Xing, Shuo Wang, Kenneth C. Bilchick, Amit R. Patel, Miaomiao Zhang

Automated identification of myocardial scar from late gadolinium enhancement cardiac magnetic resonance images (LGE-CMR) is limited by image noise and artifacts such as those related to motion and partial volume effect.

Myocardium Segmentation Segmentation

Generative Modelling of the Ageing Heart with Cross-Sectional Imaging and Clinical Data

1 code implementation28 Aug 2022 Mengyun Qiao, Berke Doga Basaran, Huaqi Qiu, Shuo Wang, Yi Guo, Yuanyuan Wang, Paul M. Matthews, Daniel Rueckert, Wenjia Bai

Understanding the morphological and functional changes of the heart during ageing is a key scientific question, the answer to which will help us define important risk factors of cardiovascular disease and monitor disease progression.

Anatomy

Improved post-hoc probability calibration for out-of-domain MRI segmentation

1 code implementation4 Aug 2022 Cheng Ouyang, Shuo Wang, Chen Chen, Zeju Li, Wenjia Bai, Bernhard Kainz, Daniel Rueckert

In image segmentation, well-calibrated probabilities allow radiologists to identify regions where model-predicted segmentations are unreliable.

Image Segmentation MRI segmentation +2

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

1 code implementation15 Jul 2022 Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.

TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection

1 code implementation4 Jul 2022 Chang Liu, Gang Yang, Shuo Wang, Hangxu Wang, Yunhua Zhang, Yutao Wang

We employ the powerful feature extraction capability of Transformer (PVTv2) to extract global semantic information from RGB data and design a lightweight CNN backbone (LWDepthNet) to extract spatial structure information from depth data without pre-training.

object-detection RGB-D Salient Object Detection +1

Trichomonas Vaginalis Segmentation in Microscope Images

no code implementations3 Jul 2022 Lin Li, Jingyi Liu, Shuo Wang, Xunkun Wang, Tian-Zhu Xiang

Trichomoniasis is a common infectious disease with high incidence caused by the parasite Trichomonas vaginalis, increasing the risk of getting HIV in humans if left untreated.

Object object-detection +2

Boundary-Guided Camouflaged Object Detection

1 code implementation2 Jul 2022 Yujia Sun, Shuo Wang, Chenglizhao Chen, Tian-Zhu Xiang

Camouflaged object detection (COD), segmenting objects that are elegantly blended into their surroundings, is a valuable yet challenging task.

Object object-detection +2

Generative Myocardial Motion Tracking via Latent Space Exploration with Biomechanics-informed Prior

1 code implementation8 Jun 2022 Chen Qin, Shuo Wang, Chen Chen, Wenjia Bai, Daniel Rueckert

In contrast to most existing approaches which impose explicit generic regularization such as smoothness, in this work we propose a novel method that can implicitly learn an application-specific biomechanics-informed prior and embed it into a neural network-parameterized transformation model.

Image Registration

Suggestive Annotation of Brain MR Images with Gradient-guided Sampling

no code implementations2 Jun 2022 Chengliang Dai, Shuo Wang, Yuanhan Mo, Elsa Angelini, Yike Guo, Wenjia Bai

We evaluate the framework on two different brain image analysis tasks, namely brain tumour segmentation and whole brain segmentation.

Brain Segmentation Image Segmentation +3

A Template-based Method for Constrained Neural Machine Translation

1 code implementation23 May 2022 Shuo Wang, Peng Li, Zhixing Tan, Zhaopeng Tu, Maosong Sun, Yang Liu

In this work, we propose a template-based method that can yield results with high translation quality and match accuracy and the inference speed of our method is comparable with unconstrained NMT models.

Machine Translation NMT +1

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

no code implementations20 May 2022 Wenxuan Wang, Wenxiang Jiao, Shuo Wang, Zhaopeng Tu, Michael R. Lyu

Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation~(MNMT) system.

Machine Translation Translation

DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive Regularizer Learning

no code implementations11 May 2022 Xiaoqin Zhang, Ziwei Huang, Jingjing Zheng, Shuo Wang, Xianta Jiang

The task of grasp pattern recognition aims to derive the applicable grasp types of an object according to the visual information.

Object

InvNorm: Domain Generalization for Object Detection in Gastrointestinal Endoscopy

no code implementations5 May 2022 Weichen Fan, Yuanbo Yang, Kunpeng Qiu, Shuo Wang, Yongxin Guo

Therefore, to address the generalization problem in GI(Gastrointestinal) endoscopy, we propose a multi-domain GI dataset and a light, plug-in block called InvNorm(Invertible Normalization), which could achieve a better generalization performance in any structure.

Domain Generalization Ethics +3

Long-term Spatio-temporal Forecasting via Dynamic Multiple-Graph Attention

1 code implementation23 Apr 2022 Wei Shao, Zhiling Jin, Shuo Wang, Yufan Kang, Xiao Xiao, Hamid Menouar, Zhaofeng Zhang, Junshan Zhang, Flora Salim

To address these issues, we construct new graph models to represent the contextual information of each node and the long-term spatio-temporal data dependency structure.

Graph Attention Graph Neural Network +1

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

1 code implementation20 Apr 2022 Yanbin Hao, Shuo Wang, Pei Cao, Xinjian Gao, Tong Xu, Jinmeng Wu, Xiangnan He

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to the utilization of perspective contexts.

Video Classification

Synthetic Distracted Driving (SynDD2) dataset for analyzing distracted behaviors and various gaze zones of a driver

1 code implementation17 Apr 2022 Mohammed Shaiqur Rahman, Jiyang Wang, Senem Velipasalar Gursoy, David Anastasiu, Shuo Wang, Anuj Sharma

This article presents a synthetic distracted driving (SynDD2 - a continuum of SynDD1) dataset for machine learning models to detect and analyze drivers' various distracted behavior and different gaze zones.

BIG-bench Machine Learning

Towards Web Phishing Detection Limitations and Mitigation

no code implementations3 Apr 2022 Alsharif Abuadbba, Shuo Wang, Mahathir Almashor, Muhammed Ejaz Ahmed, Raj Gaire, Seyit Camtepe, Surya Nepal

However, with an average of 10K phishing links reported per hour to platforms such as PhishTank and VirusTotal (VT), the deficiencies of such ML-based solutions are laid bare.

Attribute

High-resolution Iterative Feedback Network for Camouflaged Object Detection

1 code implementation22 Mar 2022 Xiaobin Hu, Shuo Wang, Xuebin Qin, Hang Dai, Wenqi Ren, Ying Tai, Chengjie Wang, Ling Shao

Spotting camouflaged objects that are visually assimilated into the background is tricky for both object detection algorithms and humans who are usually confused or cheated by the perfectly intrinsic similarities between the foreground objects and the background surroundings.

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.