Search Results for author: Min Zhang

Found 574 papers, 283 papers with code

HW-TSC’s Submissions to the WMT21 Biomedical Translation Task

no code implementations WMT (EMNLP) 2021 Hao Yang, Zhanglin Wu, Zhengzhe Yu, Xiaoyu Chen, Daimeng Wei, Zongyao Li, Hengchao Shang, Minghan Wang, Jiaxin Guo, Lizhi Lei, Chuanfei Xu, Min Zhang, Ying Qin

This paper describes the submission of Huawei Translation Service Center (HW-TSC) to WMT21 biomedical translation task in two language pairs: Chinese↔English and German↔English (Our registered team name is HuaweiTSC).

Translation

HI-CMLM: Improve CMLM with Hybrid Decoder Input

no code implementations INLG (ACL) 2021 Minghan Wang, Guo Jiaxin, Yuxia Wang, Yimeng Chen, Su Chang, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang

Mask-predict CMLM (Ghazvininejad et al., 2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence.

Decoder NMT +1

Make the Blind Translator See The World: A Novel Transfer Learning Solution for Multimodal Machine Translation

no code implementations MTSummit 2021 Minghan Wang, Jiaxin Guo, Yimeng Chen, Chang Su, Min Zhang, Shimin Tao, Hao Yang

Based on large-scale pretrained networks and the liability to be easily overfitting with limited labelled training data of multimodal translation (MMT) is a critical issue in MMT.

Multimodal Machine Translation NMT +2

Towards Robust Neural Machine Translation with Iterative Scheduled Data-Switch Training

1 code implementation COLING 2022 Zhongjian Miao, Xiang Li, Liyan Kang, Wen Zhang, Chulun Zhou, Yidong Chen, Bin Wang, Min Zhang, Jinsong Su

Most existing methods on robust neural machine translation (NMT) construct adversarial examples by injecting noise into authentic examples and indiscriminately exploit two types of examples.

Machine Translation NMT +2

Prediction Difference Regularization against Perturbation for Neural Machine Translation

no code implementations ACL 2022 Dengji Guo, Zhengrui Ma, Min Zhang, Yang Feng

Regularization methods applying input perturbation have drawn considerable attention and have been frequently explored for NMT tasks in recent years.

Machine Translation NMT +2

RST Discourse Parsing with Second-Stage EDU-Level Pre-training

1 code implementation ACL 2022 Nan Yu, Meishan Zhang, Guohong Fu, Min Zhang

Pre-trained language models (PLMs) have shown great potentials in natural language processing (NLP) including rhetorical structure theory (RST) discourse parsing. Current PLMs are obtained by sentence-level pre-training, which is different from the basic processing unit, i. e. element discourse unit (EDU). To this end, we propose a second-stage EDU-level pre-training approach in this work, which presents two novel tasks to learn effective EDU representations continually based on well pre-trained language models. Concretely, the two tasks are (1) next EDU prediction (NEP) and (2) discourse marker prediction (DMP). We take a state-of-the-art transition-based neural parser as baseline, and adopt it with a light bi-gram EDU modification to effectively explore the EDU-level pre-trained EDU representation. Experimental results on a benckmark dataset show that our method is highly effective, leading a 2. 1-point improvement in F1-score. All codes and pre-trained models will be released publicly to facilitate future studies.

Discourse Marker Prediction Discourse Parsing +1

Semi-supervised Domain Adaptation for Dependency Parsing with Dynamic Matching Network

no code implementations ACL 2022 Ying Li, Shuaike Li, Min Zhang

To address this issue, we for the first time apply a dynamic matching network on the shared-private model for semi-supervised cross-domain dependency parsing.

Dependency Parsing Domain Adaptation +1

Synchronous Refinement for Neural Machine Translation

no code implementations Findings (ACL) 2022 Kehai Chen, Masao Utiyama, Eiichiro Sumita, Rui Wang, Min Zhang

Machine translation typically adopts an encoder-to-decoder framework, in which the decoder generates the target sentence word-by-word in an auto-regressive manner.

Decoder Machine Translation +2

Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection

1 code implementation EMNLP 2021 Xincheng Ju, Dong Zhang, Rong Xiao, Junhui Li, Shoushan Li, Min Zhang, Guodong Zhou

Therefore, in this paper, we are the first to jointly perform multi-modal ATE (MATE) and multi-modal ASC (MASC), and we propose a multi-modal joint learning approach with auxiliary cross-modal relation detection for multi-modal aspect-level sentiment analysis (MALSA).

Relation Sentiment Analysis +1

Encouraging Lexical Translation Consistency for Document-Level Neural Machine Translation

no code implementations EMNLP 2021 Xinglin Lyu, Junhui Li, ZhengXian Gong, Min Zhang

In this paper we apply “one translation per discourse” in NMT, and aim to encourage lexical translation consistency for document-level NMT.

Machine Translation NMT +1

数据标注方法比较研究:以依存句法树标注为例(Comparison Study on Data Annotation Approaches: Dependency Tree Annotation as Case Study)

no code implementations CCL 2021 Mingyue Zhou, Chen Gong, Zhenghua Li, Min Zhang

“数据标注最重要的考虑因素是数据的质量和标注代价。我们调研发现自然语言处理领域的数据标注工作通常采用机标人校的标注方法以降低代价;同时, 很少有工作严格对比不同标注方法, 以探讨标注方法对标注质量和代价的影响。该文借助一个成熟的标注团队, 以依存句法数据标注为案例, 实验对比了机标人校、双人独立标注、及本文通过融合前两种方法所新提出的人机独立标注方法, 得到了一些初步的结论。”

A Coarse-to-Fine Labeling Framework for Joint Word Segmentation, POS Tagging, and Constituent Parsing

1 code implementation CoNLL (EMNLP) 2021 Yang Hou, Houquan Zhou, Zhenghua Li, Yu Zhang, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

In the coarse labeling stage, the joint model outputs a bracketed tree, in which each node corresponds to one of four labels (i. e., phrase, subphrase, word, subword).

Part-Of-Speech Tagging POS +2

HW-TSC at SemEval-2022 Task 3: A Unified Approach Fine-tuned on Multilingual Pretrained Model for PreTENS

no code implementations SemEval (NAACL) 2022 Yinglu Li, Min Zhang, Xiaosong Qiao, Minghan Wang

In order to verify whether our model could also perform better in subtask 2 (the regression subtask), the ranking score is transformed into classification labels by an up-sampling strategy.

Binary Classification TAG

Stacked AMR Parsing with Silver Data

1 code implementation Findings (EMNLP) 2021 Qingrong Xia, Zhenghua Li, Rui Wang, Min Zhang

In particular, one recent seq-to-seq work directly fine-tunes AMR graph sequences on the encoder-decoder pre-trained language model and achieves new state-of-the-art results, outperforming previous works by a large margin.

Abstract Meaning Representation AMR Parsing +3

APGN: Adversarial and Parameter Generation Networks for Multi-Source Cross-Domain Dependency Parsing

no code implementations Findings (EMNLP) 2021 Ying Li, Meishan Zhang, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

Thanks to the strong representation learning capability of deep learning, especially pre-training techniques with language model loss, dependency parsing has achieved great performance boost in the in-domain scenario with abundant labeled training data for target domains.

Dependency Parsing Language Modeling +2

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

1 code implementation12 Jun 2025 Haoyuan Shi, Yunxin Li, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang

Despite rapid advancements in video generation models, generating coherent storytelling videos that span multiple scenes and characters remains challenging.

Video Generation

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

no code implementations11 Jun 2025 Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, Min Zhang

However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance.

Mathematical Reasoning

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

1 code implementation11 Jun 2025 Zhenran Xu, Yiyu Wang, Xue Yang, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang

Starting with our curated dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning data, including node selection, workflow planning, and code-level workflow representation.

4k

ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development

1 code implementation5 Jun 2025 Zhenran Xu, Xue Yang, Yiyu Wang, Qingli Hu, Zijiao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang

We introduce ComfyUI-Copilot, a large language model-powered plugin designed to enhance the usability and efficiency of ComfyUI, an open-source platform for AI-driven art creation.

Large Language Model

Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach

no code implementations2 Jun 2025 Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang

Stickers, though small, are a highly condensed form of visual expression, ubiquitous across messaging platforms and embraced by diverse cultures, genders, and age groups.

Retrieval

SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting

no code implementations28 May 2025 Haomiao Qiu, Miao Zhang, Ziyue Qiao, Weili Guan, Min Zhang, Liqiang Nie

Informed by this analysis, we then introduce an effective method that derives the optimal partition of the gradient space for previously learned tasks.

Continual Learning

Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing

no code implementations28 May 2025 Yifan Lu, Jing Li, Yigeng Zhou, Yihui Zhang, Wenya Wang, Xiucheng Li, Meishan Zhang, Fangming Liu, Jun Yu, Min Zhang

Experimental results on multiple LLMs demonstrate that our ToxEdit outperforms previous state-of-the-art methods in both detoxification performance and safeguarding general capabilities of LLMs.

Instruction Following knowledge editing

AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems

no code implementations26 May 2025 Yu Shang, Peijie Liu, Yuwei Yan, Zijing Wu, Leheng Sheng, Yuanqing Yu, Chumeng Jiang, An Zhang, Fengli Xu, Yu Wang, Min Zhang, Yong Li

The emergence of agentic recommender systems powered by Large Language Models (LLMs) represents a paradigm shift in personalized recommendations, leveraging LLMs' advanced reasoning and role-playing capabilities to enable autonomous, adaptive decision-making.

Benchmarking Recommendation Systems

REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models

1 code implementation26 May 2025 Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Jun Rao, Min Zhang

Meanwhile, online reinforcement learning mainly adopts a length reward to encourage short reasoning responses, but tends to lose the reflection ability and harm the performance.

VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

1 code implementation25 May 2025 Yunxin Li, Xinyu Chen, Zitao Li, Zhenyu Liu, Longyue Wang, Wenhan Luo, Baotian Hu, Min Zhang

Applying Reinforcement Learning (RL) to Video Large Language Models (Video-LLMs) shows significant promise for complex video reasoning.

Reinforcement Learning (RL)

Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer

no code implementations24 May 2025 Guodong Du, Zitao Fang, Jing Li, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Honghai Liu, Min Zhang

Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS-Pruning) for slimming down fine-tuned models.

Transfer Learning

Knowledge Grafting of Large Language Models

1 code implementation24 May 2025 Guodong Du, Xuanning Zhou, Junlin Li, Zhuo Li, Zesheng Shi, WanYu Lin, Ho-Kin Tang, Xiucheng Li, Fangming Liu, Wenya Wang, Min Zhang, Jing Li

The resulting SkillPack serves as a compact and transferable knowledge carrier, ideal for heterogeneous model fusion and continual learning.

Continual Learning Knowledge Distillation +3

MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming

1 code implementation22 May 2025 Weiyang Guo, Jing Li, Wenya Wang, Yu Li, Daojing He, Jun Yu, Min Zhang

In the adversarial iterative optimization stage, the red-team model and the target model continuously improve their respective capabilities in interaction.

Red Teaming Safety Alignment

Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning

no code implementations22 May 2025 Jun Rao, Xuebo Liu, Hexuan Deng, Zepeng Lin, Zixiong Yu, Jiansheng Wei, Xiaojun Meng, Min Zhang

In the realm of data selection for reasoning tasks, existing approaches predominantly rely on externally predefined static metrics such as difficulty and diversity, which are often designed for supervised fine-tuning (SFT) and lack adaptability to continuous training processes.

Mathematical Reasoning Reinforcement Learning (RL)

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors

no code implementations21 May 2025 Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, YaoWei Wang, Min Zhang

By subtracting the machine-like patterns from the human-like distribution during the decoding process, CoPA is able to produce sentences that are less discernible by text detectors.

Language Modeling Language Modelling

Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space

1 code implementation19 May 2025 Zhengrui Ma, Yang Feng, Chenze Shao, Fandong Meng, Jie zhou, Min Zhang

We introduce SLED, an alternative approach to speech language modeling by encoding speech waveforms into sequences of continuous latent representations and modeling them autoregressively using an energy distance objective.

Language Modeling Language Modelling +2

Towards DS-NER: Unveiling and Addressing Latent Noise in Distant Annotations

1 code implementation18 May 2025 Yuyang Ding, Dan Qiao, Juntao Li, Jiajie Xu, Pingfu Chao, Xiaofang Zhou, Min Zhang

Distantly supervised named entity recognition (DS-NER) has emerged as a cheap and convenient alternative to traditional human annotation methods, enabling the automatic generation of training data by aligning text with external resources.

Language Modeling Language Modelling +4

Accurate KV Cache Quantization with Outlier Tokens Tracing

1 code implementation16 May 2025 Yi Su, Yuechi Zhou, Quantong Qiu, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang

KV Cache quantization presents a promising solution, striking a good balance between memory usage and accuracy.

Quantization

Learning from Peers in Reasoning Models

no code implementations12 May 2025 Tongxu Luo, Wenyu Du, Jiaxi Bi, Stephen Chung, Zhengyang Tang, Hao Yang, Min Zhang, Benyou Wang

Notably, our fine-tuned LeaP-T-7B matches the performance of DeepSeek-R1-Distill-Qwen-14B on AIME 2024.

Math

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

1 code implementation8 May 2025 Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang

Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integrating modalities such as text, images, audio, and video to support complex reasoning capabilities and aiming to achieve comprehensive perception, precise understanding, and deep reasoning.

Multimodal Reasoning

CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion

no code implementations7 May 2025 Yanyu Li, Pencheng Wan, Liang Han, YaoWei Wang, Liqiang Nie, Min Zhang

Stable Diffusion has advanced text-to-image synthesis, but training models to generate images with accurate object quantity is still difficult due to the high computational cost and the challenge of teaching models the abstract concept of quantity.

Denoising Image Generation +1

A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

no code implementations5 May 2025 Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He

This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic "slow thinking" - a reasoning process inspired by human cognition, as described in Kahneman's Thinking, Fast and Slow.

Math Medical Diagnosis +3

Efficient Reasoning for LLMs through Speculative Chain-of-Thought

1 code implementation27 Apr 2025 Jikai Wang, Juntao Li, Lijun Wu, Min Zhang

The proposed thinking behavior alignment improves the efficiency of drafting and the draft selection strategy maintains the prediction accuracy for complex problems.

GSM8K Math

VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

1 code implementation23 Apr 2025 Xinyu Chen, Yunxin Li, Haoyuan Shi, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang

Assessing the video comprehension capabilities of multimodal AI systems can effectively measure their understanding and reasoning abilities.

A Unified Agentic Framework for Evaluating Conditional Image Generation

1 code implementation9 Apr 2025 Jifang Wang, Xue Yang, Longyue Wang, Zhenran Xu, Yiyu Wang, YaoWei Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang

This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks.

Conditional Image Generation

DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation

1 code implementation7 Apr 2025 Xinglin Lyu, Wei Tang, Yuang Li, Xiaofeng Zhao, Ming Zhu, Junhui Li, Yunfei Lu, Daimeng Wei, Hao Yang, Min Zhang

Despite the increased discourse challenges introduced by noise from automatic speech recognition (ASR), the integration of document-level context in speech translation (ST) remains insufficiently explored.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Short Video Segment-level User Dynamic Interests Modeling in Personalized Recommendation

1 code implementation5 Apr 2025 Zhiyu He, Zhixin Ling, Jiayu Li, Zhiqiang Guo, Weizhi Ma, Xinchen Luo, Min Zhang, Guorui Zhou

In contrast, our research focuses on segment-level user interest modeling, which is crucial for understanding how users' preferences evolve during video browsing.

Recommendation Systems

Collaborative LLM Numerical Reasoning with Local Data Protection

no code implementations1 Apr 2025 Min Zhang, Yuzhe Lu, Yun Zhou, Panpan Xu, Lin Lee Cheong, Chang-Tien Lu, Haozhu Wang

Furthermore, our method improves accuracy by 16. 2% - 43. 6% while reducing data leakage by 2. 3% - 44. 6% compared to existing data protection approaches.

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

no code implementations31 Mar 2025 Yi Su, Dian Yu, Linfeng Song, Juntao Li, Haitao Mi, Zhaopeng Tu, Min Zhang, Dong Yu

Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are accessible for verification.

Mathematical Reasoning reinforcement-learning +1

Graph-Structured Driven Dual Adaptation for Mitigating Popularity Bias

no code implementations30 Mar 2025 Miaomiao Cai, Lei Chen, Yifan Wang, Zhiyong Cheng, Min Zhang, Meng Wang

Existing supervised alignment and reweighting methods mitigate this bias but have key limitations: (1) ignoring inherent variability across Graph Convolutional Networks (GCNs) layers, causing negative effects in deeper layers; (2) reliance on fixed hyperparameters to balance item popularity, restricting adaptability and increasing complexity.

Recommendation Systems

InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment

1 code implementation CVPR 2025 Yunhong Lu, Qichao Wang, Hengyuan Cao, Xierui Wang, Xiaoyin Xu, Min Zhang

To address these limitations, we introduce DDIM-InPO, an efficient method for direct preference alignment of diffusion models.

AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration

1 code implementation24 Mar 2025 Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, Min Zhang

Multi-agent systems (MAS) based on large language models (LLMs) have demonstrated significant potential in collaborative problem-solving.

MoK-RAG: Mixture of Knowledge Paths Enhanced Retrieval-Augmented Generation for Embodied AI Environments

no code implementations18 Mar 2025 Zhengsheng Guo, Linwei Zheng, Xinyang Chen, Xuefeng Bai, Kehai Chen, Min Zhang

While human cognition inherently retrieves information from diverse and specialized knowledge sources during decision-making processes, current Retrieval-Augmented Generation (RAG) systems typically operate through single-source knowledge retrieval, leading to a cognitive-algorithmic discrepancy.

Decision Making Language Modeling +4

Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation

no code implementations13 Mar 2025 Henglyu Liu, Andong Chen, Kehai Chen, Xuefeng Bai, Meizhi Zhong, Yuan Qiu, Min Zhang

Recent advancement of large language models (LLMs) has led to significant breakthroughs across various tasks, laying the foundation for the development of LLM-based speech translation systems.

Cross-Modal Retrieval Translation

Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model

no code implementations13 Mar 2025 Qiyuan Deng, Xuefeng Bai, Kehai Chen, YaoWei Wang, Liqiang Nie, Min Zhang

Reinforcement Learning (RL) algorithms for safety alignment of Large Language Models (LLMs), such as Direct Preference Optimization (DPO), encounter the challenge of distribution shift.

Language Modeling Language Modelling +4

Take Off the Training Wheels Progressive In-Context Learning for Effective Alignment

1 code implementation13 Mar 2025 Zhenyu Liu, Dongfang Li, Xinshuo Hu, Xinping Zhao, Yibin Chen, Baotian Hu, Min Zhang

We find that the transformer embeds the task function learned from demonstrations into the separator token representation, which plays an important role in the generation of prior response tokens.

In-Context Learning

IDInit: A Universal and Stable Initialization Method for Neural Network Training

no code implementations6 Mar 2025 Yu Pan, Chaozheng Wang, Zekai Wu, Qifan Wang, Min Zhang, Zenglin Xu

Addressing this concern, we introduce fully identical initialization (IDInit), a novel method that preserves identity in both the main and sub-stem layers of residual networks.

Inductive Bias

Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent

1 code implementation4 Mar 2025 Xingzuo Li, Kehai Chen, Yunfei Long, Xuefeng Bai, Yong Xu, Min Zhang

Large language model (LLM) agents typically adopt a step-by-step reasoning framework, in which they interleave the processes of thinking and acting to accomplish the given task.

Decision Making Language Modeling +2

The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents

no code implementations28 Feb 2025 Yihong Tang, Kehai Chen, Xuefeng Bai, ZhengYu Niu, Bo wang, Jie Liu, Min Zhang

Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations.

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents

1 code implementation27 Feb 2025 Zhenyu Liu, Yunxin Li, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang

Specifically, our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.

Image Quality Assessment

Evaluating Intelligence via Trial and Error

1 code implementation26 Feb 2025 Jingtao Zhan, Jiahao Zhao, Jiayu Li, Yiqun Liu, Bo Zhang, Qingyao Ai, Jiaxin Mao, Hongning Wang, Min Zhang, Shaoping Ma

When the expectation and variance of failure counts are both finite, it signals the ability to consistently find solutions to new challenges, which we define as the Autonomous Level of intelligence.

A 106K Multi-Topic Multilingual Conversational User Dataset with Emoticons

no code implementations26 Feb 2025 Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Qinglang Guo, Min Zhang

Our in-depth experiments, both quantitative and qualitative, demonstrate the dataset's potential in modeling user behavior and personalized recommendation systems, opening up new possibilities for research in personalized retrieval and conversational AI.

Recommendation Systems Retrieval

LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers

1 code implementation25 Feb 2025 Zhuocheng Zhang, Yang Feng, Min Zhang

In LevelRAG, the high-level searcher orchestrates the retrieval logic, while the low-level searchers (sparse, web, and dense) refine the queries for optimal retrieval.

Multi-hop Question Answering Question Answering +3

ASurvey: Spatiotemporal Consistency in Video Generation

no code implementations25 Feb 2025 Zhiyu Yin, Kehai Chen, Xuefeng Bai, Ruili Jiang, Juntao Li, Hongdong Li, Jin Liu, Yang Xiang, Jun Yu, Min Zhang

Video generation, by leveraging a dynamic visual generation method, pushes the boundaries of Artificial Intelligence Generated Content (AIGC).

Image Generation Video Generation

A Training-free LLM-based Approach to General Chinese Character Error Correction

1 code implementation21 Feb 2025 Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, Min Zhang

To address this issue, we introduce the task of General Chinese Character Error Correction (C2EC), which focuses on all three types of character errors.

Language Modeling Language Modelling +2

Improving Value-based Process Verifier via Structural Prior Injection

no code implementations21 Feb 2025 Zetian Sun, Dongfang Li, Baotian Hu, Jun Yu, Min Zhang

In the Large Language Model(LLM) reasoning scenario, people often estimate state value via Monte Carlo sampling.

Inductive Bias Large Language Model

Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders

no code implementations21 Feb 2025 Weiqiao Shan, Yuang Li, Yuhao Zhang, Yingfeng Luo, Chen Xu, Xiaofeng Zhao, Long Meng, Yunfei Lu, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu

Connecting audio encoders with large language models (LLMs) allows the LLM to perform various audio understanding tasks, such as automatic speech recognition (ASR) and audio captioning (AC).

Audio captioning Automatic Speech Recognition +2

Revealing and Mitigating Over-Attention in Knowledge Editing

1 code implementation20 Feb 2025 Pinzheng Wang, Zecheng Tang, Keyan Zhou, Juntao Li, Qiaoming Zhu, Min Zhang

Large Language Models have demonstrated superior performance across a wide range of tasks, but they still exhibit undesirable errors due to incorrect knowledge learned from the training data.

knowledge editing Specificity

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

1 code implementation18 Feb 2025 Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, YaoWei Wang, Min Zhang

To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1. 61, which enables weight quantization to 1. 61-bit for the first time.

Binarization Quantization

Towards Text-Image Interleaved Retrieval

1 code implementation18 Feb 2025 Xin Zhang, Ziqi Dai, Yongqi Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Jun Yu, Wenjie Li, Min Zhang

In this work, we introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences, and the model is required to understand the semantics from the interleaved context for effective retrieval.

Information Retrieval Language Modeling +5

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

1 code implementation18 Feb 2025 Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, YaoWei Wang, Min Zhang, Liqiang Nie

Then, we conduct extensive experiments with the baseline within each class, covering models with various sizes (7B-70B), bitwidths, training levels (LLaMA1/2/3/3. 1), architectures (Mixtral, DeepSeekMoE and Mamba) and modality (LLaVA1. 5 and VILA1. 5) on a wide range of evaluation metrics. Through comparative analysis on the results, we summarize the superior of each PTQ strategy and modelsize-bitwidth trade-off considering the performance.

Benchmarking Mamba +1

Exploring Translation Mechanism of Large Language Models

no code implementations17 Feb 2025 Hongbin Zhang, Kehai Chen, Xuefeng Bai, Xiucheng Li, Min Zhang

Large language models (LLMs) have succeeded remarkably in multilingual translation tasks.

Translation

Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis

no code implementations17 Feb 2025 Andong Chen, Yuchen Song, Wenxin Zhu, Kehai Chen, Muyun Yang, Tiejun Zhao, Min Zhang

The o1-Like LLMs are transforming AI by simulating human cognitive processes, but their performance in multilingual machine translation (MMT) remains underexplored.

Machine Translation Translation

SSVEP-BiMA: Bifocal Masking Attention Leveraging Native and Symmetric-Antisymmetric Components for Robust SSVEP Decoding

no code implementations16 Feb 2025 Yuxin Liu, Zhenxi Song, Guoyang Xu, ZiRui Wang, Feng Wan, Yong Hu, Min Zhang, Zhiguo Zhang

Brain-computer interface (BCI) based on steady-state visual evoked potentials (SSVEP) is a popular paradigm for its simplicity and high information transfer rate (ITR).

SSVEP

Semantic Role Labeling: A Systematical Survey

1 code implementation9 Feb 2025 Huiyao Chen, Meishan Zhang, Jing Li, Min Zhang, Lilja Øvrelid, Jan Hajič, Hao Fei

Semantic role labeling (SRL) is a central natural language processing (NLP) task aiming to understand the semantic roles within texts, facilitating a wide range of downstream applications.

Semantic Role Labeling Survey

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

1 code implementation22 Jan 2025 Zhenran Xu, Longyue Wang, Jifang Wang, Zhouyi Li, Senbao Shi, Xue Yang, Yiyu Wang, Baotian Hu, Jun Yu, Min Zhang

Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions.

Decision Making

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

1 code implementation14 Jan 2025 Weiqiao Shan, Yuhao Zhang, Yuchen Han, Bei Li, Xiaofeng Zhao, Yuang Li, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu

Recent advancements have highlighted the efficacy of self-supervised learning (SSL) features in various speech-related tasks, providing lightweight and versatile multi-view speech representations.

Self-Supervised Learning

EEG-ReMinD: Enhancing Neurodegenerative EEG Decoding through Self-Supervised State Reconstruction-Primed Riemannian Dynamics

no code implementations14 Jan 2025 ZiRui Wang, Zhenxi Song, Yi Guo, Yuxin Liu, Guoyang Xu, Min Zhang, Zhiguo Zhang

The development of EEG decoding algorithms confronts challenges such as data sparsity, subject variability, and the need for precise annotations, all of which are vital for advancing brain-computer interfaces and enhancing the diagnosis of diseases.

EEG Eeg Decoding

Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

2 code implementations9 Jan 2025 Xiaojie Li, Yibo Yang, Jianlong Wu, David A. Clifton, Yue Yu, Bernard Ghanem, Min Zhang

To this end, we propose Continuous Knowledge-Preserving Decomposition for FSCIL (CKPD-FSCIL), a framework that decomposes a model's weights into two parts: one that compacts existing knowledge (knowledge-sensitive components) and another that carries redundant capacity to accommodate new abilities (redundant-capacity components).

class-incremental learning Few-Shot Class-Incremental Learning +1

Investigating Numerical Translation with Large Language Models

no code implementations9 Jan 2025 Wei Tang, Jiawei Yu, Yuang Li, Yanqing Zhao, Weidong Zhang, Wei Feng, Min Zhang, Hao Yang

The inaccurate translation of numbers can lead to significant security issues, ranging from financial setbacks to medical inaccuracies.

Machine Translation Translation

Improving GenIR Systems Based on User Feedback

no code implementations6 Jan 2025 Qingyao Ai, Zhicheng Dou, Min Zhang

In this chapter, we discuss how to improve the GenIR systems based on user feedback.

Continual Learning Prompt Learning

Test-time Computing: from System-1 Thinking to System-2 Thinking

1 code implementation5 Jan 2025 Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang

In System-2 models, it enhances the model's reasoning ability to solve complex problems through repeated sampling, self-correction, and tree search.

RaSeRec: Retrieval-Augmented Sequential Recommendation

1 code implementation24 Dec 2024 Xinping Zhao, Baotian Hu, Yan Zhong, Shouzheng Huang, Zihao Zheng, Meng Wang, Haofen Wang, Min Zhang

Although prevailing supervised and self-supervised learning (SSL)-augmented sequential recommendation (SeRec) models have achieved improved performance with powerful neural network architectures, we argue that they still suffer from two limitations: (1) Preference Drift, where models trained on past data can hardly accommodate evolving user preference; and (2) Implicit Memory, where head patterns dominate parametric learning, making it harder to recall long tails.

Retrieval +2

GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

no code implementations22 Dec 2024 Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang

Last, we provide in-depth analyses of model scaling and training strategies, and perform ablation studies on both the model and synthetic data.

Retrieval

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs

no code implementations19 Dec 2024 Xiabin Zhou, Wenbin Wang, Minyan Zeng, Jiaxian Guo, Xuebo Liu, Li Shen, Min Zhang, Liang Ding

Efficient KV cache management in LLMs is crucial for long-context tasks like RAG and summarization.

RAG

Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering

1 code implementation18 Dec 2024 Yifan Lu, Yigeng Zhou, Jing Li, Yequan Wang, Xuebo Liu, Daojing He, Fangming Liu, Min Zhang

Multi-hop question answering (MHQA) poses a significant challenge for large language models (LLMs) due to the extensive knowledge demands involved.

graph construction knowledge editing +4

Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

1 code implementation18 Dec 2024 Yingjie Zhu, Xuefeng Bai, Kehai Chen, Yang Xiang, Min Zhang

To study the reason behind these limitations, we propose VGCure, a comprehensive benchmark covering 22 tasks for examining the fundamental graph understanding and reasoning capacities of LVLMs.

Benchmarking Graph Learning +1

Efficient Speech Command Recognition Leveraging Spiking Neural Network and Curriculum Learning-based Knowledge Distillation

no code implementations17 Dec 2024 Jiaqi Wang, Liutao Yu, Liwei Huang, Chenlin Zhou, Han Zhang, Zhenxi Song, Min Zhang, Zhengyu Ma, Zhiguo Zhang

Furthermore, by executing KDCL, we reduce the number of time steps by 60% and decrease energy consumption by 54. 8% while maintaining comparable performance to recent SOTA results.

Edge-computing Knowledge Distillation +1

DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling Check

no code implementations17 Dec 2024 Ziheng Qiao, Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang

One key characteristic of the Chinese spelling check (CSC) task is that incorrect characters are usually similar to the correct ones in either phonetics or glyph.

LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Tasks

no code implementations17 Dec 2024 Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

Large language models (LLMs) have demonstrated impressive multilingual understanding and reasoning capabilities, driven by extensive pre-training multilingual corpora and fine-tuning instruction data.

Math

Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation

no code implementations17 Dec 2024 Andong Chen, Yuchen Song, Kehai Chen, Muyun Yang, Tiejun Zhao, Min Zhang

Visual information has been introduced for enhancing machine translation (MT), and its effectiveness heavily relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations.

Language Modeling Language Modelling +4

LLM-based Discriminative Reasoning for Knowledge Graph Question Answering

no code implementations17 Dec 2024 Mufan Xu, Kehai Chen, Xuefeng Bai, Muyun Yang, Tiejun Zhao, Min Zhang

Large language models (LLMs) based on generative pre-trained Transformer have achieved remarkable performance on knowledge graph question-answering (KGQA) tasks.

Graph Question Answering Question Answering

ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty

no code implementations12 Dec 2024 Meizhi Zhong, Xikai Liu, Chen Zhang, Yikun Lei, Yan Gao, Yao Hu, Kehai Chen, Min Zhang

To accelerate the inference of LLMs, storing computed caches in memory has become the standard technique.

CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models

no code implementations10 Dec 2024 Dongfang Li, Zetian Sun, Xinshuo Hu, Baotian Hu, Min Zhang

Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences.

Continual Learning

Multi-Level Correlation Network For Few-Shot Image Classification

1 code implementation4 Dec 2024 Yunkai Dang, Min Zhang, Zhengyu Chen, Xinliang Zhang, Zheng Wang, Meijun Sun, Donglin Wang

In this paper, we argue that measure at such a level may not be effective enough to generalize from base to novel classes when using only a few images.

Few-Shot Image Classification image-classification +2

Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark

1 code implementation3 Dec 2024 Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Erik Cambria, Min Zhang, Hao Fei

T3DEM is the most crucial step in determining the quality of Emo3D generation and encompasses three key challenges: Expression Diversity, Emotion-Content Consistency, and Expression Fluidity.

Code Generation Diversity +1

Learning Monotonic Attention in Transducer for Streaming Generation

1 code implementation26 Nov 2024 Zhengrui Ma, Yang Feng, Min Zhang

Streaming generation models are increasingly utilized across various fields, with the Transducer architecture being particularly popular in industrial applications.

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization

1 code implementation21 Nov 2024 Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu

To address this, we propose DRPruning, which incorporates distributionally robust optimization to restore balanced performance across domains, along with further improvements to enhance robustness.

Language Modeling Language Modelling +1

Interpret the Internal States of Recommendation Model with Sparse Autoencoder

1 code implementation9 Nov 2024 Jiayin Wang, XiaoYu Zhang, Weizhi Ma, Min Zhang

Firstly, we train an autoencoder with sparsity constraints to reconstruct internal activations of recommendation models, making the RecSAE latents more interpretable and monosemantic than the original neuron activations.

Explainable Recommendation Fairness +1

Beyond Utility: Evaluating LLM as Recommender

1 code implementation1 Nov 2024 Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai Wang, Chuhan Wu, Min Zhang

We intend our evaluation framework and observations to benefit future research on the use of LLMs as recommenders.

Position Re-Ranking

PerSRV: Personalized Sticker Retrieval with Vision-Language Model

1 code implementation29 Oct 2024 Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang

The online retrieval part follows the paradigm of relevant recall and personalized ranking, supported by the offline pre-calculation parts, which are sticker semantic understanding, utility evaluation and personalization modules.

Language Modeling Language Modelling +2

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

1 code implementation28 Oct 2024 Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu

Despite their remarkable abilities in various tasks, large language models (LLMs) still struggle with real-time information (e. g., new facts and terms) due to the knowledge cutoff in their development process.

Benchmarking

LOGO -- Long cOntext aliGnment via efficient preference Optimization

1 code implementation24 Oct 2024 Zecheng Tang, Zechen Sun, Juntao Li, Qiaoming Zhu, Min Zhang

To overcome the GPU memory-bound issue caused by the long sequence, LOGO employs a reference-free preference optimization strategy and adopts a position synthesis method to construct the training data.

Language Modeling Language Modelling +1

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

1 code implementation24 Oct 2024 Yuyang Ding, Xinyu Shi, Xiaobo Liang, Juntao Li, Qiaoming Zhu, Min Zhang

The availability of high-quality data is one of the most important factors in improving the reasoning capability of LLMs.

Math Mathematical Reasoning

Beware of Calibration Data for Pruning Large Language Models

no code implementations23 Oct 2024 Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang

As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency.

Model Compression

Revealing and Mitigating the Local Pattern Shortcuts of Mamba

1 code implementation21 Oct 2024 Wangjie You, Zecheng Tang, Juntao Li, Lili Yao, Min Zhang

Large language models (LLMs) have advanced significantly due to the attention mechanism, but their quadratic complexity and linear memory demands limit their performance on long-context tasks.

Mamba State Space Models

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

no code implementations20 Oct 2024 Yu Zhao, Hao Fei, Xiangtai Li, Libo Qin, Jiayi Ji, Hongyuan Zhu, Meishan Zhang, Min Zhang, Jianguo Wei

In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form.

Image to text

BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation

no code implementations19 Oct 2024 Jilong Li, Zhenxi Song, Jiaqi Wang, Meishan Zhang, Honghai Liu, Min Zhang, Zhiguo Zhang

Current EEG/MEG-to-text decoding systems suffer from three key limitations: (1) reliance on teacher-forcing methods, which compromises robustness during inference, (2) sensitivity to session-specific noise, hindering generalization across subjects, and (3) misalignment between brain signals and linguistic representations due to pre-trained language model over-dominance.

EEG Representation Learning +2

MoDification: Mixture of Depths Made Easy

no code implementations18 Oct 2024 Chen Zhang, Meizhi Zhong, Qimeng Wang, Xuantao Lu, Zheyu Ye, Chengqiang Lu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang, Dawei Song

Long-context efficiency has recently become a trending topic in serving large language models (LLMs).

LLM-based Translation Inference with Iterative Bilingual Understanding

no code implementations16 Oct 2024 Andong Chen, Kehai Chen, Yang Xiang, Xuefeng Bai, Muyun Yang, Yang Feng, Tiejun Zhao, Min Zhang

The remarkable understanding and generation capabilities of large language models (LLMs) have greatly improved translation performance.

Sentence Translation

An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

no code implementations16 Oct 2024 Junjie Chen, Weihang Su, Zhumin Chu, Haitao Li, Qinyao Ai, Yiqun Liu, Min Zhang, Shaoping Ma

Moreover, our study highlights the impact of prompt strategies and evaluation formats on evaluation performance, offering guidance for method optimization in the future.

Dialogue Generation Question Answering

SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation

no code implementations15 Oct 2024 Xinping Zhao, Dongfang Li, Yan Zhong, Boren Hu, Yibin Chen, Baotian Hu, Min Zhang

Recent studies in Retrieval-Augmented Generation (RAG) have investigated extracting evidence from retrieved passages to reduce computational costs and enhance the final RAG performance, yet it remains challenging.

Chunking RAG +3

FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

no code implementations14 Oct 2024 Xinping Zhao, Yan Zhong, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Dongfang Li, Baotian Hu, Min Zhang

In this work, we propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG, termed FunnelRAG, so as to balance effectiveness and efficiency.

RAG Retrieval +1

StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs

1 code implementation10 Oct 2024 Yuanqing Yu, Zhefan Wang, Weizhi Ma, Zhicheng Guo, Jingtao Zhan, Shuai Wang, Chuhan Wu, Zhiqiang Guo, Min Zhang

Despite having powerful reasoning and inference capabilities, Large Language Models (LLMs) still need external tools to acquire real-time information retrieval or domain-specific expertise to solve complex tasks, which is referred to as tool learning.

Information Retrieval Policy Gradient Methods

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models

no code implementations8 Oct 2024 Siqi Wang, Zhengyu Chen, Bei Li, Keqing He, Min Zhang, Jingang Wang

The scaling of large language models (LLMs) is a critical research area for the efficiency and effectiveness of model training and deployment.

Mixture-of-Experts

A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models

1 code implementation5 Oct 2024 Houquan Zhou, Zhenghua Li, Bo Zhang, Chen Li, Shaopeng Lai, Ji Zhang, Fei Huang, Min Zhang

This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches.

Language Modeling Language Modelling +2

Self-Powered LLM Modality Expansion for Large Speech-Text Models

1 code implementation4 Oct 2024 Tengfei Yu, Xuebo Liu, Zhiyi Hou, Liang Ding, DaCheng Tao, Min Zhang

This study aims to refine the use of speech datasets for LSM training by addressing the limitations of vanilla instruction tuning.

Automatic Speech Recognition Instruction Following +2

Parameter Competition Balancing for Model Merging

1 code implementation3 Oct 2024 Guodong Du, Junlin Lee, Jing Li, Runhua Jiang, Yifei Guo, Shuyang Yu, Hanting Liu, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Min Zhang

Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model.

Domain Generalization model

L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?

2 code implementations3 Oct 2024 Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang

Long-context models (LCMs) have made remarkable strides in recent years, offering users great convenience for handling tasks that involve long context, such as document summarization.

8k Document Summarization +2

Question-guided Knowledge Graph Re-scoring and Injection for Knowledge Graph Question Answering

1 code implementation2 Oct 2024 Yu Zhang, Kehai Chen, Xuefeng Bai, Zhao Kang, Quanjiang Guo, Min Zhang

Knowledge graph question answering (KGQA) involves answering natural language questions by leveraging structured information stored in a knowledge graph.

Graph Question Answering Question Answering

Dynamic Planning for LLM-based Graphical User Interface Automation

1 code implementation1 Oct 2024 Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Xinbei Ma, Muyun Yang, Tiejun Zhao, Min Zhang

However, a key challenge lies in devising effective plans to guide action prediction in GUI tasks, though planning have been widely recognized as effective for decomposing complex tasks into a series of steps.

Grammar Induction from Visual, Speech and Text

no code implementations1 Oct 2024 Yu Zhao, Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua

Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics.

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification

1 code implementation30 Sep 2024 Can Cui, Siteng Huang, Wenxuan Song, Pengxiang Ding, Min Zhang, Donglin Wang

To address the occlusion issues in person Re-Identification (ReID) tasks, many methods have been proposed to extract part features by introducing external spatial information.

Decoder Occluded Person Re-Identification

Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models

no code implementations19 Sep 2024 Jun Rao, Xuebo Liu, Zepeng Lin, Liang Ding, Jing Li, DaCheng Tao, Min Zhang

Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.

Knowledge Distillation

LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation

no code implementations13 Sep 2024 Shaojun Li, Hengchao Shang, Daimeng Wei, Jiaxin Guo, Zongyao Li, Xianghui He, Min Zhang, Hao Yang

Recent advancements in integrating speech information into large language models (LLMs) have significantly improved automatic speech recognition (ASR) accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

1 code implementation11 Sep 2024 Yang Liu, Pengxiang Ding, Siteng Huang, Min Zhang, Han Zhao, Donglin Wang

Fueled by the Large Language Models (LLMs) wave, Large Visual-Language Models (LVLMs) have emerged as a pivotal advancement, bridging the gap between image and text.

Language Modeling Language Modelling

MemLong: Memory-Augmented Retrieval for Long Text Modeling

1 code implementation30 Aug 2024 Weijie Liu, Zecheng Tang, Juntao Li, Kehai Chen, Min Zhang

This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval.

4k Decoder +5

Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning

no code implementations30 Aug 2024 Fengyuan Dai, Siteng Huang, Min Zhang, Biao Gong, Donglin Wang

To transfer knowledge from seen attribute-object compositions to recognize unseen ones, recent compositional zero-shot learning (CZSL) methods mainly discuss the optimal classification branches to identify the elements, leading to the popularity of employing a three-branch architecture.

Attribute Compositional Zero-Shot Learning +1

TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models

no code implementations26 Aug 2024 Zelin Li, Kehai Chen, Lemao Liu, Xuefeng Bai, Mingming Yang, Yang Xiang, Min Zhang

In this paper, we analyze the core mechanisms of previous predominant adversarial attack methods, revealing that 1) the distributions of importance score differ markedly among victim models, restricting the transferability; 2) the sequential attack processes induces substantial time overheads.

Adversarial Attack

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

1 code implementation22 Aug 2024 Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng

Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge.

Misinformation

Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving

no code implementations19 Aug 2024 Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang

Different from the traditional translation tasks, classical Chinese poetry translation requires both adequacy and fluency in translating culturally and historically significant content and linguistic poetic elegance.

Benchmarking Machine Translation +1

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

1 code implementation19 Aug 2024 Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang

These images are designed to maintain visual consistency across different scenes using a visual-language prompting method that combines scene descriptions and images of the appearing character and setting.

Image Generation Video Generation

An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation

1 code implementation16 Aug 2024 Peiming Guo, Sinuo Liu, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang

We propose the first end-to-end model for photo-sharing multi-modal dialogue generation, which integrates an image perceptron and an image generator with a large language model.

Image Generation Language Modeling +3

Model Hijacking Attack in Federated Learning

no code implementations4 Aug 2024 Zheng Li, Siyuan Wu, Ruichuan Chen, Paarijaat Aditya, Istemi Ekin Akkus, Manohar Vanga, Min Zhang, Hao Li, Yang Zhang

Machine learning (ML), driven by prominent paradigms such as centralized and federated learning, has made significant progress in various critical applications ranging from autonomous driving to face recognition.

Autonomous Driving Data Poisoning +3

mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

no code implementations29 Jul 2024 Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang

We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders).

Contrastive Learning Reranking +2

SeqMIA: Sequential-Metric Based Membership Inference Attack

1 code implementation21 Jul 2024 Hao Li, Zheng Li, Siyuan Wu, Chengrui Hu, Yutong Ye, Min Zhang, Dengguo Feng, Yang Zhang

Building upon this signal, we introduce a novel attack method called Sequential-metric based Membership Inference Attack (SeqMIA).

Inference Attack Knowledge Distillation +1

LeKUBE: A Legal Knowledge Update BEnchmark

1 code implementation19 Jul 2024 Changyue Wang, Weihang Su, Hu Yiran, Qingyao Ai, Yueyue Wu, Cheng Luo, Yiqun Liu, Min Zhang, Shaoping Ma

Existing benchmarks for evaluating knowledge update methods are mostly designed for the open domain and cannot address the specific challenges of the legal domain, such as the nuanced application of new legal knowledge, the complexity and lengthiness of legal regulations, and the intricate nature of legal reasoning.

Legal Reasoning

Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning

1 code implementation8 Jul 2024 Xiaojie Li, Yibo Yang, Jianlong Wu, Bernard Ghanem, Liqiang Nie, Min Zhang

The dual design enables the model to maintain the robust features of base classes, while adaptively learning distinctive feature shifts for novel classes.

class-incremental learning Few-Shot Class-Incremental Learning +3

MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning

1 code implementation6 Jul 2024 Min Zhang, Xian Fu, Jianye Hao, Peilong Han, Hao Zhang, Lei Shi, Hongyao Tang, Yan Zheng

To this end, based on the characteristics of embodied task planning, we first develop a systematic evaluation framework, which encapsulates four crucial capabilities of MFMs: object understanding, spatio-temporal perception, task understanding, and embodied reasoning.

Embodied Question Answering Question Answering +1

Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

1 code implementation3 Jul 2024 Zhibin Lan, LiQiang Niu, Fandong Meng, Jie zhou, Min Zhang, Jinsong Su

Among them, the target text decoder is used to alleviate the language alignment burden, and the image tokenizer converts long sequences of pixels into shorter sequences of visual tokens, preventing the model from focusing on low-level visual features.

Decoder Machine Translation

AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction

no code implementations3 Jul 2024 Shengkun Wang, Taoran Ji, Jianfeng He, Mariam Almutairi, Dan Wang, Linhan Wang, Min Zhang, Chang-Tien Lu

This confirms the value of adversarial training in reducing stochasticity and bias for stock volatility prediction tasks.

Fairness

Feature-Specific Coefficients of Determination in Tree Ensembles

no code implementations3 Jul 2024 Zhongli Jiang, Dabao Zhang, Min Zhang

Tree ensemble methods provide promising predictions with models difficult to interpret.

Computational Efficiency

Concise and Precise Context Compression for Tool-Using Language Models

no code implementations2 Jul 2024 Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, Min Zhang, Wanxiang Che

However, when compressing tool documentation, existing methods suffer from the weaknesses of key information loss (specifically, tool/parameter name errors) and difficulty in adjusting the length of compressed sequences based on documentation lengths.

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

no code implementations27 Jun 2024 Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan

Then, an SG-based framework is built, where the textual SG (TSG) is encoded with a graph Transformer, while the video dynamic SG (DSG) and the HSG are modeled with a novel recurrent graph Transformer for spatial and temporal feature propagation.

LLM-Driven Multimodal Opinion Expression Identification

no code implementations26 Jun 2024 Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang

We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios.

text-to-speech Text to Speech

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

1 code implementation25 Jun 2024 Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang

It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step.

Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

1 code implementation25 Jun 2024 Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, Min Zhang

Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently.

Contrastive Learning few-shot-htc +8

Timo: Towards Better Temporal Reasoning for Language Models

1 code implementation20 Jun 2024 Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min Zhang, Yu Cheng

Therefore, we propose a crucial question: Can we build a universal framework to handle a variety of temporal reasoning tasks?

Question Answering

A Survey on Human Preference Learning for Large Language Models

no code implementations17 Jun 2024 Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang

In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

1 code implementation17 Jun 2024 Yunxin Li, Xinyu Chen, Baotian Hu, Longyue Wang, Haoyuan Shi, Min Zhang

Through a comprehensive and quantitative evaluation of cutting-edge models, we reveal that: 1) Video-LMMs face difficulties in fine-grained video tasks involving temporal location, object tracking, and anomaly detection; 2) Video-LMMs present inferior logical and relation reasoning abilities; 3) Open-source Video-LMMs' performance is significantly lower than GPT-4o and Gemini-1. 5, lagging by 20 points.

Anomaly Detection Logical Reasoning +2

LAIP: Learning Local Alignment from Image-Phrase Modeling for Text-based Person Search

no code implementations16 Jun 2024 Haiguang Wang, Yu Wu, Mengxia Wu, Cao Min, Min Zhang

This paper proposes the Local Alignment from Image-Phrase modeling (LAIP) framework, with Bidirectional Attention-weighted local alignment (BidirAtt) and Mask Phrase Modeling (MPM) module. BidirAtt goes beyond the typical forward attention by considering the gradient of the transformer as backward attention, utilizing two-sided information for local alignment.

Person Search Text based Person Search

TasTe: Teaching Large Language Models to Translate through Self-Reflection

1 code implementation12 Jun 2024 Yutong Wang, Jiali Zeng, Xuebo Liu, Fandong Meng, Jie zhou, Min Zhang

The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.

Instruction Following Machine Translation +2

On the Hallucination in Simultaneous Machine Translation

1 code implementation11 Jun 2024 Meizhi Zhong, Kehai Chen, Zhengshan Xue, Lemao Liu, Mingming Yang, Min Zhang

It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information.

Hallucination Machine Translation +1

CTC-based Non-autoregressive Textless Speech-to-Speech Translation

1 code implementation11 Jun 2024 Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng

Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences.

Knowledge Distillation Machine Translation +2

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

no code implementations11 Jun 2024 Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results.

Contrastive Learning Speech Synthesis +7

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

1 code implementation10 Jun 2024 Yan Gao, Zhiwei Cao, Zhongjian Miao, Baosong Yang, Shiyu Liu, Min Zhang, Jinsong Su

In this paper, we first conduct a preliminary study to reveal two key limitations of $k$NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of $\lambda$ for determining $k$NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for $k$NN retrieval at different timesteps.

Domain Adaptation Machine Translation +3

AutoSurvey: Large Language Models Can Automatically Write Surveys

1 code implementation10 Jun 2024 Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence.

Retrieval Survey

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

1 code implementation5 Jun 2024 Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

Simultaneous speech-to-speech translation (Simul-S2ST, a. k. a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication.

 Ranked #1 on de-en on CVSS

Automatic Speech Recognition (ASR) de-en +11

Large Language Models as Evaluators for Recommendation Explanations

1 code implementation5 Jun 2024 XiaoYu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie Sun, Min Zhang

We also provide further insights into combining human labels with the LLM evaluation process and utilizing ensembles of multiple heterogeneous LLM evaluators to enhance the accuracy and stability of evaluations.

Common Sense Reasoning Instruction Following +3

Multimodal Reasoning with Multimodal Knowledge Graph

no code implementations4 Jun 2024 Junlin Lee, Yequan Wang, Jing Li, Min Zhang

Multimodal reasoning with large language models (LLMs) often suffers from hallucinations and the presence of deficient or outdated knowledge within LLMs.

cross-modal alignment Graph Attention +3

Demonstration Augmentation for Zero-shot In-context Learning

1 code implementation3 Jun 2024 Yi Su, Yunpeng Tai, Yixin Ji, Juntao Li, Bowen Yan, Min Zhang

Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates.

In-Context Learning

Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias

1 code implementation31 May 2024 Miaomiao Cai, Lei Chen, Yifan Wang, Haoyue Bai, Peijie Sun, Le Wu, Min Zhang, Meng Wang

To alleviate popularity bias, existing efforts focus on emphasizing unpopular items or separating the correlation between item representations and their popularity.

Collaborative Filtering Contrastive Learning +1

ReChorus2.0: A Modular and Task-Flexible Recommendation Library

1 code implementation28 May 2024 Jiayu Li, Hanyu Li, Zhiyu He, Weizhi Ma, Peijie Sun, Min Zhang, Shaoping Ma

However, these libraries often impose certain restrictions on data and seldom support the same model to perform different tasks and input formats, limiting users from customized explorations.

Click-Through Rate Prediction Recommendation Systems +1

From Role-Play to Drama-Interaction: An LLM Solution

no code implementations23 May 2024 Weiqi Wu, Hongqiu Wu, Lai Jiang, XingYuan Liu, Jiale Hong, Hai Zhao, Min Zhang

Drama is a form of storytelling inspired by human creativity, proceeding with a predefined storyline, carrying emotions and thoughts.

Instruction Following

Why Not Transform Chat Large Language Models to Non-English?

1 code implementation22 May 2024 Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, ShuJian Huang

For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters.

Knowledge Distillation

Cannot find the paper you are looking for? You can Submit a new open access paper.