Search Results for author: Xin Zhang

Found 294 papers, 85 papers with code

Sociolectal Analysis of Pretrained Language Models

no code implementations EMNLP 2021 Sheng Zhang, Xin Zhang, Weiming Zhang, Anders Søgaard

Using data from English cloze tests, in which subjects also self-reported their gender, age, education, and race, we examine performance differences of pretrained language models across demographic groups, defined by these (protected) attributes.

Data Efficacy for Language Model Training

no code implementations26 Jun 2025 Yalun Dai, Yangyu Huang, Xin Zhang, Wenshan Wu, Chong Li, Wenhui Lu, Shijie Cao, Li Dong, Scarlett Li

This work introduces a general paradigm, DELT, for considering data efficacy in LM training, which highlights the significance of training data organization.

Language Modeling Language Modelling +1

A Novel Large Vision Foundation Model (LVFM)-based Approach for Generating High-Resolution Canopy Height Maps in Plantations for Precision Forestry Management

no code implementations25 Jun 2025 Shen Tan, Xin Zhang, Liangxiu Han, Huaguo Huang, Han Wang

Accurate, cost-effective monitoring of plantation aboveground biomass (AGB) is crucial for supporting local livelihoods and carbon sequestration initiatives like the China Certified Emission Reduction (CCER) program.

Gradients of unitary optical neural networks using parameter-shift rule

no code implementations13 Jun 2025 JinZhe Jiang, YaQian Zhao, Xin Zhang, Chen Li, Yunlong Yu, Hailing Liu

This paper explores the application of the parameter-shift rule (PSR) for computing gradients in unitary optical neural networks (UONNs).

LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation

no code implementations11 Jun 2025 Chen-Chia Chang, Wan-Hsuan Lin, Yikang Shen, Yiran Chen, Xin Zhang

Automation of analog topology design is crucial due to customized requirements of modern applications with heavily manual engineering efforts.

Language Modeling Language Modelling

OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment

1 code implementation11 Jun 2025 Chao-Hong Tan, Qian Chen, Wen Wang, Chong Deng, Qinglin Zhang, Luyao Cheng, Hai Yu, Xin Zhang, Xiang Lv, Tianyu Zhao, Chong Zhang, Yukun Ma, Yafeng Chen, Hui Wang, Jiaqing Liu, Jieping Ye

This paper presents OmniDRCA, a parallel speech-text foundation model based on joint autoregressive modeling, featuring dual-resolution speech representations and contrastive cross-modal alignment.

cross-modal alignment Question Answering +2

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

1 code implementation5 Jun 2025 Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, Jingren Zhou

In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models.

Reranking Retrieval +1

TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis

1 code implementation30 May 2025 Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xuanhong Li, Chong Teng, Donghong Ji, Zhuang Li

Large Language Models (LLMs) excel in various natural language processing tasks but remain vulnerable to generating harmful content or being exploited for malicious purposes.

Diversity Language Modeling +4

EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions

no code implementations29 May 2025 Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xiaolu Zhang, Jun Zhou, Yuxiang Peng, Li Zheng, Chong Teng, Donghong Ji, Zhuang Li

Large language models (LLMs) frequently refuse to respond to pseudo-malicious instructions: semantically harmless input queries triggering unnecessary LLM refusals due to conservative safety alignment, significantly impairing user experience.

Safety Alignment

DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing

1 code implementation25 May 2025 Shengdong Han, Shangdong Yang, Xin Zhang, YuXuan Li, Xiang Li, Jian Yang, Ming-Ming Cheng, Yimian Dai

Resolving closely-spaced small targets in dense clusters presents a significant challenge in infrared imaging, as the overlapping signals hinder precise determination of their quantity, sub-pixel positions, and radiation intensities.

Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary

no code implementations23 May 2025 Licheng Pan, Yongqi Tong, Xin Zhang, Xiaolu Zhang, Jun Zhou, Zhixuan Chu

This approach not only provides a more precise and interpretable view of model safety decisions but also seamlessly extends to multilingual scenarios. We have explored the safety decision boundaries of various LLMs and construct the MORBench evaluation set to facilitate robust assessment of model safety and helpfulness across multiple languages.

Safety Alignment

$\text{R}^2\text{ec}$: Towards Large Recommender Models with Reasoning

1 code implementation22 May 2025 Runyang You, Yongqi Li, Xinyu Lin, Xin Zhang, Wenjie Wang, Wenjie Li, Liqiang Nie

To address these issues, we propose \name, a unified large recommender model with intrinsic reasoning capabilities.

Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities

1 code implementation21 May 2025 Can Rong, Xin Zhang, Yanxin Xi, Hongjie Sui, Jingtao Ding, Yong Li

Surprisingly, we find that satellite imagery, publicly available across the globe, contains rich urban semantic signals to support high-quality OD flow generation, with over 98\% expressiveness of traditional multisource hard-to-collect urban sociodemographic, economics, land use, and point of interest data.

EAVIT: Efficient and Accurate Human Value Identification from Text data via LLMs

no code implementations19 May 2025 Wenhao Zhu, Yuhang Xie, Guojie Song, Xin Zhang

The rapid evolution of large language models (LLMs) has revolutionized various fields, including the identification and discovery of human values within text data.

Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models

no code implementations18 May 2025 Junhao Liu, Haonan Yu, Xin Zhang

With Large language models (LLMs) becoming increasingly prevalent in various applications, the need for interpreting their predictions has become a critical challenge.

Explanation Generation

Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation

no code implementations16 May 2025 Xin Zhang, Ziruo Zhang, Jiawei Du, Zuozhu Liu, Joey Tianyi Zhou

Multimodal Dataset Distillation (MDD) seeks to condense large-scale image-text datasets into compact surrogates while retaining their effectiveness for cross-modal learning.

cross-modal alignment Dataset Distillation

UrbanMind: Urban Dynamics Prediction with Multifaceted Spatial-Temporal Large Language Models

no code implementations16 May 2025 Yuhang Liu, Yingxue Zhang, Xin Zhang, Ling Tian, Yanhua Li, Jun Luo

Understanding and predicting urban dynamics is crucial for managing transportation systems, optimizing urban planning, and enhancing public services.

Test-time Adaptation

A Survey on 3D Reconstruction Techniques in Plant Phenotyping: From Classical Methods to Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and Beyond

1 code implementation30 Apr 2025 Jiajia Li, Xinda Qi, Seyed Hamidreza Nabaei, Meiqi Liu, Dong Chen, Xin Zhang, Xunyuan Yin, Zhaojian Li

Through this review, we aim to provide insights into how these diverse 3D reconstruction techniques can be effectively leveraged for automated and high-throughput plant phenotyping, contributing to the next generation of agricultural technology.

3DGS 3D Reconstruction +2

An Accelerated Camera 3DMA Framework for Efficient Urban GNSS Multipath Estimation

no code implementations23 Apr 2025 Shiyao Lv, Xin Zhang, Xingqun Zhan

Robust GNSS positioning in urban environments is still plagued by multipath effects, particularly due to the complex signal propagation induced by ubiquitous surfaces with varied radio frequency reflectivities.

Computational Efficiency

Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models

no code implementations21 Apr 2025 Zijin Yang, Xin Zhang, Kejiang Chen, Kai Zeng, Qiyi Yao, Han Fang, Weiming Zhang, Nenghai Yu

We propose a double-channel design that leverages pseudorandom error-correcting codes to encode the random seed required for watermark pseudorandomization, achieving performance-lossless watermarking under a fixed watermark key and overcoming key management challenges.

Management

CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models

no code implementations18 Apr 2025 Feiyang Li, Peng Fang, Zhan Shi, Arijit Khan, Fang Wang, Dan Feng, WeiHao Wang, Xin Zhang, Yongjian Cui

Chain-of-thought (CoT) reasoning boosts large language models' (LLMs) performance on complex tasks but faces two key limitations: a lack of reliability when solely relying on LLM-generated reasoning chains and interference from natural language reasoning steps with the models' inference process, also known as the inference logic of LLMs.

Knowledge Graphs RAG +1

MAIN: Mutual Alignment Is Necessary for instruction tuning

no code implementations17 Apr 2025 Fanyi Yang, Jianfeng Liu, Xin Zhang, Haoyu Liu, Xixin Cao, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Qi Zhang

Instruction tuning has enabled large language models (LLMs) to achieve remarkable performance, but its success heavily depends on the availability of large-scale, high-quality instruction-response pairs.

REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective

no code implementations15 Apr 2025 Zhihao Xu, Yongqi Tong, Xin Zhang, Jun Zhou, Xiting Wang

Multi-objective preference alignment in language models often encounters a challenging trade-off: optimizing for one human preference (e. g., helpfulness) frequently compromises others (e. g., harmlessness) due to the inherent conflicts between competing objectives.

MIEB: Massive Image Embedding Benchmark

1 code implementation14 Apr 2025 Chenghao Xiao, Isaac Chung, Imene Kerboua, Jamie Stirling, Xin Zhang, Márton Kardos, Roman Solomatin, Noura Al Moubayed, Kenneth Enevoldsen, Niklas Muennighoff

We introduce the Massive Image Embedding Benchmark (MIEB) to evaluate the performance of image and image-text embedding models across the broadest spectrum to date.

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

3 code implementations14 Apr 2025 Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang, Haibo Lei, Qifang Gao, Yaqing Li, Weihua Luo, Tsing Li, Qing Wang, Yi Liu, Yang Wang, Hongyu An, Liou Zhang, Shijie Zhao, Lianhong Song, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Jing Wei, Mengyang Wang, Ruilong Guo, Qian Wang, Qingliang Liu, Yang Cheng, Davinci, Enxuan Gu, Pinxin Liu, Yongsheng Yu, Hang Hua, Yunlong Tang, Shihao Wang, ZhiYu Zhang, Yukun Yang, Jiyu Wu, Jiancheng Huang, Yifan Liu, Yi Huang, Shifeng Chen, Rui Chen, Yi Feng, Mingxi Li, Cailu Wan, XiangJi Wu, Zibin Liu, Jinyang Zhong, Kihwan Yoon, Ganzorig Gankhuyag, Shengyun Zhong, Mingyang Wu, Renjie Li, Yushen Zuo, Zhengzhong Tu, Zongang Gao, Guannan Chen, Yuan Tian, Wenhui Chen, Weijun Yuan, Zhan Li, Yihang Chen, Yifan Deng, Ruting Deng, Yilin Zhang, Huan Zheng, Yanyan Wei, Wenxuan Zhao, Suiyi Zhao, Fei Wang, Kun Li, Yinggan Tang, Mengjie Su, Jae-Hyeon Lee, Dong-Hyeop Son, Ui-Jin Choi, Tiancheng Shao, Yuqing Zhang, Mengcheng Ma, Donggeun Ko, Youngsang Kwak, Jiun Lee, Jaehwa Kwak, YuXuan Jiang, Qiang Zhu, Siyue Teng, Fan Zhang, Shuyuan Zhu, Bing Zeng, David Bull, Jing Hu, Hui Deng, Xuan Zhang, Lin Zhu, Qinrui Fan, Weijian Deng, Junnan Wu, Wenqin Deng, Yuquan Liu, Zhaohong Xu, Jameer Babu Pinjari, Kuldeep Purohit, Zeyu Xiao, Zhuoyuan Li, Surya Vashisth, Akshay Dudhane, Praful Hambarde, Sachin Chaudhary, Satya Naryan Tazi, Prashant Patil, Santosh Kumar Vipparthi, Subrahmanyam Murala, Wei-Chen Shen, I-Hsiang Chen, Yunzhe Xu, Chen Zhao, Zhizhou Chen, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Alejandro Merino, Bruno Longarela, Javier Abad, Marcos V. Conde, Simone Bianco, Luca Cogo, Gianmarco Corti

This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR).

Super-Resolution valid

Outage Probability Analysis for OTFS with Finite Blocklength

no code implementations13 Apr 2025 Xin Zhang, Wensheng Lin, Lixin Li, Zhu Han, Tad Matsumoto

Orthogonal time frequency space (OTFS) modulation is widely acknowledged as a prospective waveform for future wireless communication networks. To provide insights for the practical system design, this paper analyzes the outage probability of OTFS modulation with finite blocklength. To begin with, we present the system model and formulate the analysis of outage probability for OTFS with finite blocklength as an equivalent problem of calculating the outage probability with finite blocklength over parallel additive white Gaussian noise (AWGN) channels. Subsequently, we apply the equivalent noise approach to derive a lower bound on the outage probability of OTFS with finite blocklength under both average power allocation and water-filling power allocation strategies, respectively. Finally, the lower bounds of the outage probability are determined using the Monte-Carlo method for the two power allocation strategies. The impact of the number of resolvable paths and coding rates on the outage probability is analyzed, and the simulation results are compared with the theoretical lower bounds.

Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving

1 code implementation10 Apr 2025 Shihong Gao, Xin Zhang, Yanyan Shen, Lei Chen

In this paper, we introduce Apt-Serve, a scalable framework designed to enhance effective throughput in LLM inference serving.

Large Language Model Scheduling

Emergency Communication: OTFS-Based Semantic Transmission with Diffusion Noise Suppression

no code implementations10 Apr 2025 Kexin Zhang, Xin Zhang, Lixin Li, Wensheng Lin, Wenchi Cheng, Qinghe Du

However, the complex channel conditions in high-speed mobile scenarios significantly impact the reliability and efficiency of traditional communication systems.

Denoising Key Information Extraction +1

CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection

no code implementations1 Apr 2025 Xin Zhang, Keren Fu, Qijun Zhao

The Segment Anything Model 2 (SAM2), a prompt-guided video foundation model, has remarkably performed in video object segmentation, drawing significant attention in the community.

Camouflaged Object Segmentation object-detection +3

MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach

no code implementations31 Mar 2025 Xin Zhang, Siting Huang, Xiangyang Luo, Yifan Xie, Weijiang Yu, Heng Chang, Fei Ma, Fei Yu

The Text-to-Mask diffusion model provides \textit{diversity} and \textit{flexibility} to the framework, while the semantic-aware face editing model ensures \textit{controllability} of the framework.

Diversity

Object Isolated Attention for Consistent Story Visualization

no code implementations30 Mar 2025 Xiangyang Luo, Junhao Cheng, Yifan Xie, Xin Zhang, Tao Feng, Zhou Liu, Fei Ma, Fei Yu

Open-ended story visualization is a challenging task that involves generating coherent image sequences from a given storyline.

Object Story Visualization

CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration

no code implementations18 Mar 2025 Chunyu Yang, Shengben Bi, Yihui Xu, Xin Zhang

With the increasing demand for efficient and flexible robotic exploration solutions, Reinforcement Learning (RL) is becoming a promising approach in the field of autonomous robotic exploration.

reinforcement-learning Reinforcement Learning +1

Enhanced Multi-Tuple Extraction for Alloys: Integrating Pointer Networks and Augmented Attention

no code implementations10 Mar 2025 Mengzhe Hei, Zhouran Zhang, Qingbao Liu, Yan Pan, Xiang Zhao, Yongqian Peng, Yicong Ye, Xin Zhang, Shuxin Bai

Extracting high-quality structured information from scientific literature is crucial for advancing material design through data-driven methods.

SynGraph: A Dynamic Graph-LLM Synthesis Framework for Sparse Streaming User Sentiment Modeling

no code implementations6 Mar 2025 Xin Zhang, Qiyu Wei, Yingjie Zhu, Linhai Zhang, Deyu Zhou, Sophia Ananiadou

In this paper, we introduce SynGraph, a novel framework designed to address data sparsity in sentiment analysis on streaming reviews.

Sentiment Analysis

Process-based Self-Rewarding Language Models

1 code implementation5 Mar 2025 Shimao Zhang, Xiao Liu, Xin Zhang, Junxiao Liu, Zheheng Luo, ShuJian Huang, Yeyun Gong

Human-annotated preference data is used for training to further improve LLMs' performance, which is constrained by the upper limit of human performance.

Mathematical Reasoning

IterPref: Focal Preference Learning for Code Generation via Iterative Debugging

no code implementations4 Mar 2025 Jie Wu, Haoling Li, Xin Zhang, Jianwen Luo, Yangyu Huang, Ruihang Chu, Yujiu Yang, Scarlett Li

Preference learning enhances Code LLMs beyond supervised fine-tuning by leveraging relative quality comparisons.

Code Generation

C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models

no code implementations25 Feb 2025 Xin Zhang, Liang Bai, Xian Yang, Jiye Liang

Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that has been extensively applied in areas such as natural language processing and computer vision.

Continual Learning

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

no code implementations20 Feb 2025 Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li, Shali Jiang, Jiyan Yang, Xiaozhen Xia, Fan Yang, Yasmine Badr, Ellie Wen, Shuyu Xu, Hansey Chen, Zhengyu Zhang, Jade Nie, Chunzhi Yang, Zhichen Zeng, Weilin Zhang, Xingliang Huang, Qianru Li, Shiquan Wang, Evelyn Lyu, Wenjing Lu, Rui Zhang, Wenjun Wang, Jason Rudy, Mengyue Hang, Kai Wang, Yinbin Ma, Shuaiwen Wang, Sihan Zeng, Tongyi Tang, Xiaohan Wei, Longhao Jin, Jamey Zhang, Marcus Chen, Jiayi Xu, Angie Huang, Xihuan Zeng, Chi Zhang, Zhengli Zhao, Jared Yang, Qiang Jin, Xian Chen, Amit Anand Amlesahwaram, Lexi Song, Liang Luo, Yuchen Hao, Nan Xiao, Yavuz Yetim, Luoshang Pan, Gaoxiang Liu, Yuxi Hu, Yuzhen Huang, Jackie Xu, Rich Zhu, Xin Zhang, Yiqun Liu, Hang Yin, Yuxin Chen, Buyun Zhang, Xiaoyi Liu, Xingyuan Wang, Wenguang Mao, Zhijing Li, Zhehui Zhou, Feifan Gu, Qin Huang, Chonglin Sun, Nancy Yu, Shuo Gu, Shupin Mao, Benjamin Au, Jingzheng Qin, Peggy Yao, Jae-Woo Choi, Bin Gao, Ernest Wang, Lei Zhang, Wen-Yen Chen, Ted Lee, Jay Zha, Yi Meng, Alex Gong, Edison Gao, Alireza Vahdatpour, Yiping Han, Yantao Yao, Toshinari Kureha, Shuo Chang, Musharaf Sultan, John Bocharov, Sagar Chordia, Xiaorui Gan, Peng Sun, Rocky Liu, Bo Long, Wenlin Chen, Santanu Kolay, Huayu Li

Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system.

Data Augmentation

Towards Text-Image Interleaved Retrieval

1 code implementation18 Feb 2025 Xin Zhang, Ziqi Dai, Yongqi Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Jun Yu, Wenjie Li, Min Zhang

In this work, we introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences, and the model is required to understand the semantics from the interleaved context for effective retrieval.

Information Retrieval Language Modeling +5

CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models

no code implementations17 Feb 2025 Guanghao Zhou, Panjia Qiu, Mingyuan Fan, Cen Chen, Mingyuan Chu, Xin Zhang, Jun Zhou

We define jailbreak attacks as an optimization problem within the embedding space of masked language models.

Combinatorial Optimization

Accelerating Anchors via Specialization and Feature Transformation

no code implementations16 Feb 2025 Haonan Yu, Junhao Liu, Xin Zhang

Anchors is a popular local model-agnostic explanation technique whose applicability is limited by its computational inefficiency.

Explanation Generation

Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications

1 code implementation5 Feb 2025 Bo Wen, Xin Zhang

This paper presents SOLOMON, a novel Neuro-inspired Large Language Model (LLM) Reasoning Network architecture that enhances the adaptability of foundation models for domain-specific applications.

In-Context Learning Language Modeling +5

Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models

no code implementations4 Feb 2025 Haoran Ye, Tianze Zhang, Yuhang Xie, Liyuan Zhang, Yuanyi Ren, Xin Zhang, Guojie Song

Despite growing efforts in evaluating, understanding, and aligning LLM values, a psychologically grounded LLM value system remains underexplored.

Benchmarking Decision Making

Personalized Interpolation: An Efficient Method to Tame Flexible Optimization Window Estimation

no code implementations23 Jan 2025 Xin Zhang, Weiliang Li, Rui Li, Zihang Fu, Tongyi Tang, Zhengyu Zhang, Wen-Yen Chen, Nima Noorshams, Nirav Jasapara, Xiaowen Ding, Ellie Wen, Xue Feng

In the realm of online advertising, optimizing conversions is crucial for delivering relevant products to users and enhancing business outcomes.

Study on a Fast Solver for Combined Field Integral Equations of 3D Conducting Bodies Based on Graph Neural Networks

1 code implementation17 Jan 2025 Tao Shan, Xin Zhang, Di wu

In this paper, we present a graph neural networks (GNNs)-based fast solver (GraphSolver) for solving combined field integral equations (CFIEs) of 3D conducting bodies.

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark

3 code implementations CVPR 2025 Xin Zhang, Xue Yang, YuXuan Li, Jian Yang, Ming-Ming Cheng, Xiang Li

Our approach can effectively improve the performance of existing state-of-the-art weakly supervised methods and even surpasses fully supervised models on existing optical benchmarks (i. e., DOTA-v1. 0 dataset).

object-detection Object Detection +1

EpiCoder: Encompassing Diversity and Complexity in Code Generation

no code implementations8 Jan 2025 Yaoxiang Wang, Haoling Li, Xin Zhang, Jie Wu, Xiao Liu, Wenxiang Hu, Zhongxin Guo, Yangyu Huang, Ying Xin, Yujiu Yang, Jinsong Su, Qi Chen, Scarlett Li

Effective instruction tuning is indispensable for optimizing code LLMs, aligning model behavior with user expectations and enhancing model performance in real-world applications.

Code Generation Diversity

JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems

no code implementations CVPR 2025 Yifan Wang, Jian Zhao, Zhaoxin Fan, Xin Zhang, Xuecheng Wu, Yudian Zhang, Lei Jin, Xinyue Li, Gang Wang, Mengxi Jia, Ping Hu, Zheng Zhu, Xuelong Li

To benchmark this task, we introduce the TDUAV dataset, the largest dataset for joint UAV tracking and intent understanding, featuring 1, 328 challenging video sequences, over 163K annotated thermal frames, and 3K VQA pairs.

Question Answering Visual Question Answering

GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

no code implementations22 Dec 2024 Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang

Last, we provide in-depth analyses of model scaling and training strategies, and perform ablation studies on both the model and synthetic data.

Retrieval

MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark

1 code implementation19 Dec 2024 QiHao Zhao, Yangyu Huang, Tengchao Lv, Lei Cui, Qinzheng Sun, Shaoguang Mao, Xin Zhang, Ying Xin, Qiufeng Yin, Scarlett Li, Furu Wei

This benchmark reassesses LLMs' understanding of world knowledge by averting both unintentional and malicious data leakage.

MMLU Multiple-choice +2

PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis

no code implementations11 Dec 2024 Yifan Xie, Tao Feng, Xin Zhang, Xiangyang Luo, Zixuan Guo, Weijiang Yu, Heng Chang, Fei Ma, Fei Richard Yu

Furthermore, we integrate the audio-point enhancement module, which not only ensures the synchronization of the audio signal with the corresponding lip point cloud within the feature space, but also facilitates a deeper understanding of the interrelations among cross-modal conditional features.

Moderating the Generalization of Score-based Generative Model

no code implementations10 Dec 2024 Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong

To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i. e., re-training the model after removing the undesirable training data, and find it does not work in SGMs.

Image Inpainting Machine Unlearning +1

InfinityDrive: Breaking Time Limits in Driving World Models

no code implementations2 Dec 2024 Xi Guo, Chenjing Ding, Haoxuan Dou, Xin Zhang, Weixuan Tang, Wei Wu

Comprehensive experiments in multiple datasets validate InfinityDrive's ability to generate complex and varied scenarios, highlighting its potential as a next-generation driving world model built for the evolving demands of autonomous driving.

Autonomous Driving Diversity +1

Neural-Network-Enhanced Metalens Camera for High-Definition, Dynamic Imaging in the Long-Wave Infrared Spectrum

no code implementations26 Nov 2024 Jing-Yang Wei, Hao Huang, Xin Zhang, De-Mao Ye, Yi Li, Le Wang, Yao-Guang Ma, Yang-Hui Li

To provide a lightweight and cost-effective solution for the long-wave infrared imaging using a singlet, we develop a camera by integrating a High-Frequency-Enhancing Cycle-GAN neural network into a metalens imaging system.

Generative Adversarial Network

Multitask Learning for SAR Ship Detection with Gaussian-Mask Joint Segmentation

no code implementations21 Nov 2024 Ming Zhao, Xin Zhang, André Kaup

Detecting ships in synthetic aperture radar (SAR) images is challenging due to strong speckle noise, complex surroundings, and varying scales.

Denoising object-detection +2

Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training

no code implementations21 Nov 2024 Zheheng Luo, Xin Zhang, Xiao Liu, Haoling Li, Yeyun Gong, Chen Qi, Peng Cheng

To evaluate the effectiveness of Velocitune, we conduct experiments in a reasoning-focused dataset with CodeLlama, as well as in a corpus specialised for system command generation with Llama3 and Mistral.

Math

LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits

1 code implementation19 Nov 2024 Dimple Vijay Kochar, Hanrui Wang, Anantha Chandrakasan, Xin Zhang

Existing automation efforts using methods like Bayesian Optimization (BO) and Reinforcement Learning (RL) are sub-optimal and costly to generalize across different topologies and technology nodes.

Bayesian Optimization Reinforcement Learning (RL)

BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment

no code implementations16 Nov 2024 Sizhe Wang, Yongqi Tong, Hengyuan Zhang, Dawei Li, Xin Zhang, Tianlong Chen

Building on this, we further propose Balanced Preference Optimization (BPO), designed to dynamically augment the knowledge depth of each sample.

Informativeness

Legal Evalutions and Challenges of Large Language Models

no code implementations15 Nov 2024 Jiaqi Wang, Huan Zhao, Zhenyuan Yang, Peng Shu, JunHao Chen, Haobo Sun, Ruixi Liang, Shixin Li, Pengcheng Shi, Longjun Ma, Zongjia Liu, Zhengliang Liu, Tianyang Zhong, Yutong Zhang, Chong Ma, Xin Zhang, Tuo Zhang, Tianli Ding, Yudan Ren, Tianming Liu, Xi Jiang, Shu Zhang

In this paper, we review legal testing methods based on Large Language Models (LLMs), using the OPENAI o1 model as a case study to evaluate the performance of large models in applying legal provisions.

Legal Reasoning

The Backpropagation of the Wave Network

no code implementations11 Nov 2024 Xin Zhang, Victor S. Sheng

This paper provides an in-depth analysis of Wave Network, a novel token representation method derived from the Wave Network, designed to capture both global and local semantics of input text through wave-inspired complex vectors.

Language Modeling Language Modelling

Neuro-Symbolic AI: Explainability, Challenges, and Future Trends

no code implementations7 Nov 2024 Xin Zhang, Victor S. Sheng

Explainability is an essential reason limiting the application of neural networks in many vital fields.

Prediction

Bridging the Gap: Representation Spaces in Neuro-Symbolic AI

no code implementations7 Nov 2024 Xin Zhang, Victor S. Sheng

Neuro-symbolic AI is an effective method for improving the overall performance of AI models by combining the advantages of neural networks and symbolic learning.

Wave Network: An Ultra-Small Language Model

no code implementations4 Nov 2024 Xin Zhang, Victor S. Sheng

We propose an innovative token representation and update method in a new ultra-small language model: the Wave network.

Language Modeling Language Modelling +4

Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

1 code implementation31 Oct 2024 Minghui Chen, Meirui Jiang, Xin Zhang, Qi Dou, Zehua Wang, Xiaoxiao Li

To address these communication cost issues and increase the performance of pre-trained model adaptation in FL, we propose an innovative model interpolation-based local training technique called ``Local Superior Soups.''

Federated Learning

Robust Graph Neural Networks for Stability Analysis in Dynamic Networks

no code implementations29 Oct 2024 Xin Zhang, Zhen Xu, Yue Liu, Mengfang Sun, Tong Zhou, Wenying Sun

In the current context of accelerated globalization and digitalization, the complexity and uncertainty of financial markets are increasing, and the identification and prevention of economic risks have become a key link in maintaining the stability of the financial system.

Graph Neural Network Representation Learning

Self-Supervised Graph Neural Networks for Enhanced Feature Extraction in Heterogeneous Information Networks

no code implementations23 Oct 2024 Jianjun Wei, Yue Liu, Xin Huang, Xin Zhang, Wenyi Liu, Xu Yan

This paper explores the applications and challenges of graph neural networks (GNNs) in processing complex graph data brought about by the rapid development of the Internet.

Attribute Diversity +2

ConLUX: Concept-Based Local Unified Explanations

no code implementations16 Oct 2024 Junhao Liu, Haonan Yu, Xin Zhang

With the rapid advancements of various machine learning models, there is a significant demand for model-agnostic explanation techniques, which can explain these models across different architectures.

DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection

1 code implementation11 Oct 2024 Haochen Li, Rui Zhang, Hantao Yao, Xin Zhang, Yifan Hao, Xinkai Song, Xiaqing Li, Yongwei Zhao, Ling Li, Yunji Chen

Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain.

General Knowledge object-detection +1

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

no code implementations9 Oct 2024 Xin Zhang, Xiang Lyu, Zhihao Du, Qian Chen, Dong Zhang, Hangrui Hu, Chaohong Tan, Tianyu Zhao, Yuxuan Wang, Bin Zhang, Heng Lu, Yaqian Zhou, Xipeng Qiu

Current methods of building LLMs with voice interaction capabilities rely heavily on explicit text autoregressive generation before or during speech response generation to maintain content quality, which unfortunately brings computational overhead and increases latency in multi-turn interactions.

Response Generation

SAFE: Semantic Adaptive Feature Extraction with Rate Control for 6G Wireless Communications

no code implementations2 Oct 2024 Yuna Yan, Lixin Li, Xin Zhang, Wensheng Lin, Wenchi Cheng, Zhu Han

Most current Deep Learning-based Semantic Communication (DeepSC) systems are designed and trained exclusively for particular single-channel conditions, which restricts their adaptability and overall bandwidth utilization.

Semantic Communication

Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment

1 code implementation26 Sep 2024 Jiawei Du, Xin Zhang, Juncheng Hu, Wenxin Huang, Joey Tianyi Zhou

Specifically, we introduce a novel method that employs dynamic and directed weight adjustment techniques to modulate the synthesis process, thereby maximizing the representativeness and diversity of each synthetic instance.

Dataset Distillation Diversity

Cascade Prompt Learning for Vision-Language Model Adaptation

2 code implementations26 Sep 2024 Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li

Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks.

General Knowledge image-classification +5

Multimodality Adaptive Transformer and Mutual Learning for Unsupervised Domain Adaptation Vehicle Re-Identification

no code implementations IEEE Transactions on Intelligent Transportation Systems 2024 Xin Zhang, Yunan Ling, Kaige Li, Weimin Shi, Zhong Zho

Unsupervised Domain Adaptation Vehicle Re-Identification (UDA vehicle re-ID) aims to enable the model trained in the source domain dataset to adapt to the target domain data and obtain accurate re-identification results, which has received widespread attention due to its practicality in the field of intelligent transportation systems.

Attribute Pseudo Label +2

SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation

no code implementations2 Sep 2024 Yang Zhang, Rui Zhang, Xuecheng Nie, Haochen Li, Jikun Chen, Yifan Hao, Xin Zhang, Luoqi Liu, Ling Li

We found that attribute confusion occurs when a certain region of the latent features attend to multiple or incorrect prompt tokens.

Attribute Text to Image Generation +1

A novel k-generation propagation model for cyber risk and its application to cyber insurance

no code implementations26 Aug 2024 Na Ren, Xin Zhang

The frequent occurrence of cyber risks and their serious economic consequences have created a growth market for cyber insurance.

ICFRNet: Image Complexity Prior Guided Feature Refinement for Real-time Semantic Segmentation

no code implementations25 Aug 2024 Xin Zhang, Teodor Boyadzhiev, Jinglei Shi, Jufeng Yang

In this paper, we leverage image complexity as a prior for refining segmentation features to achieve accurate real-time semantic segmentation.

Philosophy Real-Time Semantic Segmentation +1

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

no code implementations13 Aug 2024 Xin Zhang, Jiawei Du, Ping Liu, Joey Tianyi Zhou

This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis.

Dataset Distillation

FSSC: Federated Learning of Transformer Neural Networks for Semantic Image Communication

no code implementations31 Jul 2024 Yuna Yan, Xin Zhang, Lixin Li, Wensheng Lin, Rui Li, Wenchi Cheng, Zhu Han

In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC).

Federated Learning Semantic Communication

mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

no code implementations29 Jul 2024 Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang

We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders).

Contrastive Learning Reranking +2

Revising the Problem of Partial Labels from the Perspective of CNNs' Robustness

no code implementations24 Jul 2024 Xin Zhang, Yuqi Song, Wyatt McCurdy, XiaoFeng Wang, Fei Zuo

These remarkable achievements are greatly attributed to the support of extensive datasets with precise labels.

LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits

no code implementations19 Jul 2024 Chen-Chia Chang, Yikang Shen, Shaoze Fan, Jing Li, Shun Zhang, Ningyuan Cao, Yiran Chen, Xin Zhang

To this end, we introduce LaMAGIC, a pioneering language model-based topology generation model that leverages supervised finetuning for automated analog circuit design.

Electrical Engineering Graph Generation +2

Positive and Unlabeled Data: Model, Estimation, Inference, and Classification

no code implementations13 Jul 2024 Siyan Liu, Chi-Kuang Yeh, Xin Zhang, Qinglong Tian, Pengfei Li

This study introduces a new approach to addressing positive and unlabeled (PU) data through the double exponential tilting model (DETM).

parameter estimation

Detection-Triggered Recursive Impact Mitigation against Secondary False Data Injection Attacks in Microgrids

no code implementations9 Jul 2024 Mengxiang Liu, Xin Zhang, Rui Zhang, Zhuoran Zhou, Zhenyong Zhang, Ruilong Deng

The proposed mitigation method can work even in the worst case where all communication links are under SFDIAs and only require extra current sensors.

Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

no code implementations8 Jul 2024 Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization.

Lesion Detection Lesion Segmentation

Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing

no code implementations20 Jun 2024 Xinbo Zhao, Yingxue Zhang, Xin Zhang, Yu Yang, Yiqun Xie, Yanhua Li, Jun Luo

MODA addresses the challenges of data scarcity and heterogeneity in a multi-task urban setting through Contrastive Data Sharing among tasks.

Autonomous Driving Data Augmentation +5

CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks

1 code implementation20 Jun 2024 Jie Feng, Jun Zhang, Tianhui Liu, Xin Zhang, Tianjian Ouyang, Junbo Yan, Yuwei Du, Siqi Guo, Yong Li

The challenge in constructing a systematic evaluation benchmark for urban research lies in the diversity of urban data, the complexity of application scenarios and the highly dynamic nature of the urban environment.

General Knowledge Human Dynamics +2

GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks

1 code implementation19 Jun 2024 Fan Zhang, Xin Zhang

Massive number of applications involve data with underlying relationships embedded in non-Euclidean space.

Kolmogorov-Arnold Networks

Video Frame Interpolation for Polarization via Swin-Transformer

no code implementations17 Jun 2024 Feng Huang, Xin Zhang, YiXuan Xu, Xuesong Wang, Xianyu Wu

Video Frame Interpolation (VFI) has been extensively explored and demonstrated, yet its application to polarization remains largely unexplored.

Video Frame Interpolation

AutoSurvey: Large Language Models Can Automatically Write Surveys

1 code implementation10 Jun 2024 Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence.

Retrieval Survey

Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification

no code implementations9 Jun 2024 Yuxin Hong, Xiao Zhang, Xin Zhang, Joey Tianyi Zhou

In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices.

image-classification Image Classification +2

ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

1 code implementation6 Jun 2024 Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song

This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientations and value understanding in LLMs.

CIRCUITSYNTH: Leveraging Large Language Models for Circuit Topology Synthesis

no code implementations6 Jun 2024 Prashanth Vijayaraghavan, Luyao Shi, Ehsan Degan, Xin Zhang

Circuit topology generation plays a crucial role in the design of electronic circuits, influencing the fundamental functionality of the circuit.

valid

Magnetic Resonance Image Processing Transformer for General Accelerated Image Reconstruction

no code implementations23 May 2024 Guoyao Shen, Mengyu Li, Stephan Anderson, Chad W. Farris, Xin Zhang

Recent advancements in deep learning have enabled the development of generalizable models that achieve state-of-the-art performance across various imaging tasks.

Anatomy Deep Learning +5

E2GNN: Efficient Graph Neural Network Ensembles for Semi-Supervised Classification

1 code implementation6 May 2024 Xin Zhang, Daochen Zha, Qiaoyu Tan

Next, instead of directly combing their outputs for label inference, we train a simple multi-layer perceptron--MLP model to mimic their predictions on both labeled and unlabeled nodes.

Ensemble Learning Graph Neural Network

A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation

no code implementations26 Apr 2024 Xin Zhang, Liangxiu Han, Tam Sobeih, Lianghao Han, Darren Dancey

This necessitates the development of innovative, spike-aware algorithms tailored for event cameras, a task compounded by the irregularity, continuity, noise, and spatial and temporal characteristics inherent in spiking data. Harnessing the strong generalization capabilities of transformer neural networks for spatiotemporal data, we propose a purely spike-driven spike transformer network for depth estimation from spiking camera data.

Depth Estimation Knowledge Distillation +1

SpeechAlign: Aligning Speech Generation to Human Preferences

1 code implementation8 Apr 2024 Dong Zhang, Zhaowei Li, ShiMin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

However, the integration of human feedback to align speech outputs to human preferences is often neglected.

Language Modeling Language Modelling

DSGNN: A Dual-View Supergrid-Aware Graph Neural Network for Regional Air Quality Estimation

no code implementations2 Apr 2024 Xin Zhang, Ling Chen, Xing Tang, Hongyu Shi

To this end, we propose a Dual-view Supergrid-aware Graph Neural Network (DSGNN) for regional air quality estimation, which can model the spatial dependencies of distant grid regions from dual views (i. e., satellite-derived aerosol optical depth (AOD) and meteorology).

Graph Neural Network

GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition

1 code implementation22 Mar 2024 Lei Jiang, Weixin Yang, Xin Zhang, Hao Ni

Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision.

Action Recognition Dimensionality Reduction +1

Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal Fluids

no code implementations15 Mar 2024 Yanfei Li, Juejing Liu, Xiaodong Zhao, Wenjun Liu, Tong Geng, Ang Li, Xin Zhang

Traditional analysis of highly distorted micro-X-ray diffraction ({\mu}-XRD) patterns from hydrothermal fluid environments is a time-consuming process, often requiring substantial data preprocessing and labeled experimental data.

Binary Classification Deep Learning +1

Two-sided Acoustic Metascreen for Broadband and Individual Reflection and Transmission Control

no code implementations12 Mar 2024 Ao Chen, Xin Zhang

Acoustic wave modulation plays a pivotal role in various applications, including sound-field reconstruction, wireless communication, and particle manipulation, among others.

ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Disscusion via Argumentation Schemes

no code implementations10 Mar 2024 Shengxin Hong, Liang Xiao, Xin Zhang, Jianxia Chen

We construct a formal model of ArgMed-Agents and present conjectures for theoretical guarantees.

Fine-grainedly Synthesize Streaming Data Based On Large Language Models With Graph Structure Understanding For Data Sparsity

no code implementations10 Mar 2024 Xin Zhang, Linhai Zhang, Deyu Zhou, Guoqiang Xu

Due to the sparsity of user data, sentiment analysis on user reviews in e-commerce platforms often suffers from poor performance, especially when faced with extremely sparse user data or long-tail labels.

Attribute Sentiment Analysis

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

1 code implementation CVPR 2024 Zheng Li, Xiang Li, Xinyi Fu, Xin Zhang, Weiqiang Wang, Shuo Chen, Jian Yang

To our best knowledge, we are the first to (1) perform unsupervised domain-specific prompt-driven knowledge distillation for CLIP, and (2) establish a practical pre-storing mechanism of text features as shared class vectors between teacher and student.

Knowledge Distillation Prompt Engineering +2

A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence

no code implementations20 Feb 2024 Penghai Zhao, Xin Zhang, Jiayue Cao, Ming-Ming Cheng, Jian Yang, Xiang Li

This paper presents a thorough analysis of these literature reviews within the PAMI field, and tries to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews?

Articles Language Modelling +2

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

1 code implementation19 Feb 2024 Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng Qiu

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music.

Language Modeling Language Modelling +1

Knowledge Graph Assisted Automatic Sports News Writing

no code implementations17 Feb 2024 Yang Cao, Xinyi Chen, Xin Zhang, Siying Li

In this paper, we present a novel method for automatically generating sports news, which employs a unique algorithm that extracts pivotal moments from live text broadcasts and uses them to create an initial draft of the news.

Knowledge Graph Completion

Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

no code implementations16 Feb 2024 Xin Zhang, Keren Fu, Qijun Zhao

To facilitate the seamless integration of global classification features with the finely detailed local features selected by DPSM, we introduce a novel feature blending module (FBM).

Contrastive Learning Occluded Person Re-Identification

ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

1 code implementation14 Feb 2024 Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang

Large Language Models (LLMs) rely on Human Preference Alignment (HPA) to ensure the generation of safe content.

In-Context Learning

Partially Recentralization Softmax Loss for Vision-Language Models Robustness

no code implementations6 Feb 2024 Hao Wang, JinZhe Jiang, Xin Zhang, Chen Li

However, it has been shown that multimodal NLP are vulnerable to adversarial attacks, where the outputs of a model can be dramatically changed by a perturbation to the input.

Adversarial Robustness Diversity

ReTaSA: A Nonparametric Functional Estimation Approach for Addressing Continuous Target Shift

no code implementations29 Jan 2024 Hwanwoo Kim, Xin Zhang, Jiwei Zhao, Qinglong Tian

This work focuses on the target shift problem in a regression setting (Zhang et al., 2013; Nguyen et al., 2016).

regression

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

1 code implementation24 Jan 2024 Dong Zhang, Xin Zhang, Jun Zhan, ShiMin Li, Yaqian Zhou, Xipeng Qiu

It comprises an autoregressive model based on LLM for semantic information modeling and a non-autoregressive model employing flow matching for perceptual information modeling.

text-to-speech Text to Speech +1

CLIP Model for Images to Textual Prompts Based on Top-k Neighbors

no code implementations18 Jan 2024 Xin Zhang, YeMing Cai, Tianzhi Jia

Text-to-image synthesis, a subfield of multimodal generation, has gained significant attention in recent years.

Image Generation multimodal generation

UV-SAM: Adapting Segment Anything Model for Urban Village Identification

1 code implementation16 Jan 2024 Xin Zhang, Yu Liu, Yuming Lin, Qingmin Liao, Yong Li

Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities.

image-classification Image Classification +1

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

1 code implementation8 Jan 2024 Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng Qiu

In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.

Language Modelling Large Language Model +1

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

no code implementations2 Jan 2024 Weidong Liu, Xiaojun Mao, Xiaofei Zhang, Xin Zhang

To fast solve the non-smooth loss under a given privacy budget, we develop a Fast Robust And Privacy-Preserving Estimation (FRAPPE) algorithm for least absolute deviation regression.

Privacy Preserving regression

HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

no code implementations29 Dec 2023 Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan

Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images.

Object Object Discovery +2

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

1 code implementation20 Dec 2023 Zhaojian Yu, Xin Zhang, Ning Shang, Yangyu Huang, Can Xu, Yishujie Zhao, Wenxiang Hu, Qiufeng Yin

Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks.

Code Generation

TMID: A Comprehensive Real-world Dataset for Trademark Infringement Detection in E-Commerce

1 code implementation8 Dec 2023 Tongxin Hu, Zhuang Li, Xin Jin, Lizhen Qu, Xin Zhang

Annually, e-commerce platforms incur substantial financial losses due to trademark infringements, making it crucial to identify and mitigate potential legal risks tied to merchant information registered to the platforms.

Legal Reasoning

Physics Inspired Criterion for Pruning-Quantization Joint Learning

1 code implementation1 Dec 2023 Weiying Xie, Xiaoyi Fan, Xin Zhang, Yunsong Li, Jie Lei, Leyuan Fang

Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices.

image-classification Image Classification +2

Automated interpretation of congenital heart disease from multi-view echocardiograms

no code implementations30 Nov 2023 Jing Wang, Xiaofeng Liu, Fangyun Wang, Lin Zheng, Fengqiao Gao, Hanwen Zhang, Xin Zhang, Wanqing Xie, Binbin Wang

Our video-based model can diagnose with an accuracy of 93. 9\% (binary classification), and 92. 1\% (3-class classification) in a collected 2D video testing set, which does not need key-frame selection and view annotation in testing.

Binary Classification

CLIPC8: Face liveness detection algorithm based on image-text pairs and contrastive learning

1 code implementation29 Nov 2023 Xu Liu, Shu Zhou, Yurong Song, Wenzhe Luo, Xin Zhang

To tackle this issue, we propose a face liveness detection method based on image-text pairs and contrastive learning, dividing liveness attack problems in the financial field into eight categories and using text information to describe the images of these eight types of attacks.

Contrastive Learning Face Recognition

LDConv: Linear deformable convolution for improving convolutional neural networks

2 code implementations20 Nov 2023 Xin Zhang, Yingze Song, Tingting Song, Degang Yang, Yichen Ye, Jie zhou, Liming Zhang

In response to the above questions, the Linear Deformable Convolution (LDConv) is explored in this work, which gives the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to provide richer options for the trade-off between network overhead and performance.

object-detection Object Detection

Learning to Reconstruct Accelerated MRI Through K-space Cold Diffusion without Noise

1 code implementation16 Nov 2023 Guoyao Shen, Mengyu Li, Chad W. Farris, Stephan Anderson, Xin Zhang

In this paper, we propose a k-space cold diffusion model that performs image degradation and restoration in k-space without the need for Gaussian noise.

Deep Learning Image Generation +2

Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale

no code implementations14 Nov 2023 Wei Wen, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Hang Yin, Weiwei Chu, Kaveh Hassani, Mengying Sun, Jiang Liu, Xu Wang, Lin Jiang, Yuxin Chen, Buyun Zhang, Xi Liu, Dehua Cheng, Zhengxing Chen, Guang Zhao, Fangqiu Han, Jiyan Yang, Yuchen Hao, Liang Xiong, Wen-Yen Chen

In industry system, such as ranking system in Meta, it is unclear whether NAS algorithms from the literature can outperform production baselines because of: (1) scale - Meta ranking systems serve billions of users, (2) strong baselines - the baselines are production models optimized by hundreds to thousands of world-class engineers for years since the rise of deep learning, (3) dynamic baselines - engineers may have established new and stronger baselines during NAS search, and (4) efficiency - the search pipeline must yield results quickly in alignment with the productionization life cycle.

Neural Architecture Search

Reconfigurable Intelligent Surface & Edge -- An Introduction of an EM manipulation structure on obstacles' edge

no code implementations3 Nov 2023 Tianqi Xiang, Zhiwei Jiang, Weijun Hong, Xin Zhang, Yuehong Gao

In this paper, Reconfigurable Intelligent Surface & Edge (RISE) is proposed to extend RIS' abilities of reflection and refraction over surfaces to diffraction around obstacles' edge for better adaptation to specific coverage scenarios.

Image Recognition of Oil Leakage Area Based on Logical Semantic Discrimination

no code implementations3 Nov 2023 Weiying Lin, Che Liu, Xin Zhang, Zhen Wei, Sizhe Li, Xun Ma

The process begins with histogram equalization to enhance the original image, followed by the use of Mask RCNN to identify the preliminary positions and outlines of oil tanks, the ground, and areas of potential oil contamination.

Map-assisted TDOA Localization Enhancement Based On CNN

no code implementations2 Nov 2023 YiWen Chen, Tianqi Xiang, Xi Chen, Xin Zhang

For signal processing related to localization technologies, non line of sight (NLOS) multipaths have a significant impact on the localization error level.

Prediction

TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis

no code implementations27 Oct 2023 Ziquan Zhu, Jing Tao, Shuihua Wang, Xin Zhang, Yudong Zhang

Five indexes are selected in this paper, which are accuracy, sensitivity, precision, F1-score, and specificity.

Specificity

Self-supervised Fetal MRI 3D Reconstruction Based on Radiation Diffusion Generation Model

no code implementations16 Oct 2023 Junpeng Tan, Xin Zhang, Yao Lv, Xiangmin Xu, Gang Li

Finally, the experimental results on real-world fetal brain MRI stacks demonstrate the state-of-the-art performance of our method.

3D Reconstruction NeRF +1

Language Models are Universal Embedders

1 code implementation12 Oct 2023 Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang

As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario.

Code Search Language Modeling +3

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

3 code implementations31 Aug 2023 Xin Zhang, Dong Zhang, ShiMin Li, Yaqian Zhou, Xipeng Qiu

Therefore, we propose SpeechTokenizer, a unified speech tokenizer for speech large language models.

Decoder Language Modeling +4

Particle swarm optimization with state-based adaptive velocity limit strategy

no code implementations2 Aug 2023 Xinze Li, Kezhi Mao, Fanfan Lin, Xin Zhang

Several adaptive VL strategies have been introduced with which the performance of PSO can be improved.

State Estimation

Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress

no code implementations1 Aug 2023 Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao

However, to minimize the current stress when the DAB converter is under TPS modulation, two difficulties exist in analysis process and realization process, respectively.

Towards Imbalanced Large Scale Multi-label Classification with Partially Annotated Labels

no code implementations31 Jul 2023 Xin Zhang, Yuqi Song, Fei Zuo, XiaoFeng Wang

In this work, we address the issue of label imbalance and investigate how to train classifiers using partial labels in large labeling spaces.

Multi-Label Classification MUlTI-LABEL-ClASSIFICATION

Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter

no code implementations30 Jul 2023 Xinze Li, Josep Pou, Jiaxin Dong, Fanfan Lin, Changyun Wen, Suvajit Mukherjee, Xin Zhang

The D2EA approach is instantiated for the efficiency optimization of a hybrid modulation for neutral-point-clamped dual-active-bridge (NPC-DAB) converter.

Collaborative Graph Neural Networks for Attributed Network Embedding

1 code implementation22 Jul 2023 Qiaoyu Tan, Xin Zhang, Xiao Huang, Hao Chen, Jundong Li, Xia Hu

Graph neural networks (GNNs) have shown prominent performance on attributed network embedding.

Attribute Network Embedding

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

no code implementations20 Jul 2023 Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao, Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang

Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios.

Key Information Extraction

Joint Service Caching, Communication and Computing Resource Allocation in Collaborative MEC Systems: A DRL-based Two-timescale Approach

no code implementations19 Jul 2023 Qianqian Liu, Haixia Zhang, Xin Zhang, Dongfeng Yuan

Meeting the strict Quality of Service (QoS) requirements of terminals has imposed a signiffcant challenge on Multiaccess Edge Computing (MEC) systems, due to the limited multidimensional resources.

Deep Reinforcement Learning Edge-computing

Early Autism Diagnosis based on Path Signature and Siamese Unsupervised Feature Compressor

no code implementations12 Jul 2023 Zhuowen Yin, Xinyao Ding, Xin Zhang, Zhengwang Wu, Li Wang, Xiangmin Xu, Gang Li

Specifically, we propose a Siamese verification framework to extend the scarce data, and an unsupervised compressor to alleviate data imbalance by extracting key features.

A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery

no code implementations27 Jun 2023 Xin Zhang, Liangxiu Han

The success of SSL is heavily dependent on a pre-designed pretext task, which introduces an inductive bias into the model from a large amount of unlabelled data.

Earth Observation Inductive Bias +4

Attention Hybrid Variational Net for Accelerated MRI Reconstruction

no code implementations21 Jun 2023 Guoyao Shen, Boran Hao, Mengyu Li, Chad W. Farris, Ioannis Ch. Paschalidis, Stephan W. Anderson, Xin Zhang

However, the drawback of these structures is that they are not fully utilizing the information from both domains (k-space and image).

compressed sensing MRI Reconstruction

Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model

no code implementations13 Jun 2023 Xin Zhang, Jiaxian Guo, Paul Yoo, Yutaka Matsuo, Yusuke Iwasawa

To guarantee the visual coherence of the generated or edited image, we introduce an inpainting and harmonizing module to guide the pre-trained diffusion model to seamlessly blend the inserted subject into the scene naturally.

Denoising Image Generation +1

GDA: Generative Data Augmentation Techniques for Relation Extraction Tasks

no code implementations26 May 2023 Xuming Hu, Aiwei Liu, Zeqi Tan, Xin Zhang, Chenwei Zhang, Irwin King, Philip S. Yu

These techniques neither preserve the semantic consistency of the original sentences when rule-based augmentations are adopted, nor preserve the syntax structure of sentences when expressing relations using seq2seq models, resulting in less diverse augmentations.

Data Augmentation Relation +1

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

1 code implementation18 May 2023 Dong Zhang, ShiMin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

Multi-modal large language models are regarded as a crucial step towards Artificial General Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT.

Language Modeling Language Modelling +3

RFAConv: Innovating Spatial Attention and Standard Convolutional Operation

1 code implementation6 Apr 2023 Xin Zhang, Chen Liu, Degang Yang, Tingting Song, Yichen Ye, Ke Li, Yingze Song

In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that the spatial attention mechanism essentially solves the problem of convolutional kernel parameter sharing.

Classification Object Detection +1

D-Score: A White-Box Diagnosis Score for CNNs Based on Mutation Operators

no code implementations3 Apr 2023 Xin Zhang, Yuqi Song, XiaoFeng Wang, Fei Zuo

However, concerns have been raised with respect to the trustworthiness of these models: The standard testing method evaluates the performance of a model on a test set, while low-quality and insufficient test sets can lead to unreliable evaluation results, which can have unforeseeable consequences.

Autonomous Driving Data Augmentation +2

LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability

3 code implementations29 Mar 2023 Zhengqing Miao, Xin Zhang, Meirong Zhao, Dong Ming

By incorporating two novel attention modules designed specifically for EEG signals, the channel attention module and the depth attention module, LMDA-Net can effectively integrate features from multiple dimensions, resulting in improved classification performance across various BCI tasks.

EEG Motor Imagery

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

no code implementations26 Mar 2023 Zhuoying Zhao, Ziling Tan, Pinghui Mo, Xiaonan Wang, Dan Zhao, Xin Zhang, Ming Tao, Jie Liu

This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations.

Atomic Forces

Distribution-restrained Softmax Loss for the Model Robustness

no code implementations22 Mar 2023 Hao Wang, Chen Li, JinZhe Jiang, Xin Zhang, YaQian Zhao, Weifeng Gong

Recently, the robustness of deep learning models has received widespread attention, and various methods for improving model robustness have been proposed, including adversarial training, model architecture modification, design of loss functions, certified defenses, and so on.

Diversity model

Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data Interpretation

no code implementations20 Mar 2023 Xiaodong Zhao, YiXuan Luo, Juejing Liu, Wenjun Liu, Kevin M. Rosso, Xiaofeng Guo, Tong Geng, Ang Li, Xin Zhang

This study highlighted the importance of labeled experimental patterns on the training of DNN models to solve u-XRD mapping data from in-situ experiments involving liquid phase.

Co-Occurrence Matters: Learning Action Relation for Temporal Action Localization

no code implementations15 Mar 2023 Congqi Cao, Yizhe WANG, Yue Lu, Xin Zhang, Yanning Zhang

Existing works in this field mainly suffer from two weaknesses: (1) They often neglect the multi-label case and only focus on temporal modeling.

Relation Temporal Action Localization

PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

no code implementations5 Mar 2023 Zhuqing Liu, Xin Zhang, Songtao Lu, Jia Liu

Decentralized min-max optimization problems with domain constraints underpins many important ML applications, including multi-agent ML fairness assurance, and policy evaluations in multi-agent reinforcement learning.

Fairness Multi-agent Reinforcement Learning

Knowledge-infused Contrastive Learning for Urban Imagery-based Socioeconomic Prediction

1 code implementation25 Feb 2023 Yu Liu, Xin Zhang, Jingtao Ding, Yanxin Xi, Yong Li

To address such issues, in this paper, we propose a Knowledge-infused Contrastive Learning (KnowCL) model for urban imagery-based socioeconomic prediction.

Contrastive Learning Prediction +1

sMRI-PatchNet: A novel explainable patch-based deep learning network for Alzheimer's disease diagnosis and discriminative atrophy localisation with Structural MRI

no code implementations17 Feb 2023 Xin Zhang, Liangxiu Han, Lianghao Han, Haoming Chen, Darren Dancey, Daoqiang Zhang

Specifically, it consists of two primary components: 1) A fast and efficient explainable patch selection mechanism for determining the most discriminative patches based on computing the SHapley Additive exPlanations (SHAP) contribution to a transfer learning model for AD diagnosis on massive medical data; and 2) A novel patch-based network for extracting deep features and AD classfication from the selected patches with position embeddings to retain position information, capable of capturing the global and local information of inter- and intra-patches.

Position Transfer Learning

Ordinal Label Distribution Learning

no code implementations ICCV 2023 Changsong Wen, Xin Zhang, Xingxu Yao, Jufeng Yang

Therefore, we propose a new paradigm, termed ordinal label distribution learning (OLDL).

Age Estimation

Bring Your Own View: Graph Neural Networks for Link Prediction with Personalized Subgraph Selection

1 code implementation23 Dec 2022 Qiaoyu Tan, Xin Zhang, Ninghao Liu, Daochen Zha, Li Li, Rui Chen, Soo-Hyun Choi, Xia Hu

To bridge the gap, we introduce a Personalized Subgraph Selector (PS2) as a plug-and-play framework to automatically, personally, and inductively identify optimal subgraphs for different edges when performing GNNLP.

Link Prediction

Weakly Supervised Video Anomaly Detection Based on Cross-Batch Clustering Guidance

no code implementations16 Dec 2022 Congqi Cao, Xin Zhang, Shizhou Zhang, Peng Wang, Yanning Zhang

To enhance the discriminative power of features, we propose a batch clustering based loss to encourage a clustering branch to generate distinct normal and abnormal clusters based on a batch of data.

Anomaly Detection Clustering +1

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations4 Nov 2022 Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image

no code implementations24 Oct 2022 Xin Zhang, Rabab Abdelfattah, Yuqi Song, Samuel A. Dauchert, XiaoFeng Wang

Depth information is the foundation of perception, essential for autonomous driving, robotics, and other source-constrained applications.

Autonomous Driving Decoder +1

An Effective Approach for Multi-label Classification with Missing Labels

no code implementations24 Oct 2022 Xin Zhang, Rabab Abdelfattah, Yuqi Song, XiaoFeng Wang

Through comprehensive experiments on three large-scale multi-label image datasets, i. e. MS-COCO, NUS-WIDE, and Pascal VOC12, we show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches in most cases, and in some cases even approaches with fully labeled datasets.

Missing Labels Multi-class Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.