no code implementations • WMT (EMNLP) 2021 • Yimeng Chen, Chang Su, Yingtao Zhang, Yuxia Wang, Xiang Geng, Hao Yang, Shimin Tao, Guo Jiaxin, Wang Minghan, Min Zhang, Yujia Liu, ShuJian Huang
This paper presents our work in WMT 2021 Quality Estimation (QE) Shared Task.
no code implementations • WMT (EMNLP) 2021 • Zongyao Li, Daimeng Wei, Hengchao Shang, Xiaoyu Chen, Zhanglin Wu, Zhengzhe Yu, Jiaxin Guo, Minghan Wang, Lizhi Lei, Min Zhang, Hao Yang, Ying Qin
This paper presents the submission of Huawei Translation Service Center (HW-TSC) to WMT 2021 Triangular MT Shared Task.
no code implementations • WMT (EMNLP) 2021 • Zhengzhe Yu, Daimeng Wei, Zongyao Li, Hengchao Shang, Xiaoyu Chen, Zhanglin Wu, Jiaxin Guo, Minghan Wang, Lizhi Lei, Min Zhang, Hao Yang, Ying Qin
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to the WMT 2021 Large-Scale Multilingual Translation Task.
no code implementations • WMT (EMNLP) 2021 • Hao Yang, Zhanglin Wu, Zhengzhe Yu, Xiaoyu Chen, Daimeng Wei, Zongyao Li, Hengchao Shang, Minghan Wang, Jiaxin Guo, Lizhi Lei, Chuanfei Xu, Min Zhang, Ying Qin
This paper describes the submission of Huawei Translation Service Center (HW-TSC) to WMT21 biomedical translation task in two language pairs: Chinese↔English and German↔English (Our registered team name is HuaweiTSC).
no code implementations • COLING 2022 • Zijie Lin, Bin Liang, Yunfei Long, Yixue Dang, Min Yang, Min Zhang, Ruifeng Xu
This essentially allows the framework to understand the appropriate graph structures for learning intricate relations among different modalities.
no code implementations • INLG (ACL) 2021 • Minghan Wang, Guo Jiaxin, Yuxia Wang, Yimeng Chen, Su Chang, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang
Mask-predict CMLM (Ghazvininejad et al., 2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence.
no code implementations • MTSummit 2021 • Minghan Wang, Jiaxin Guo, Yimeng Chen, Chang Su, Min Zhang, Shimin Tao, Hao Yang
Based on large-scale pretrained networks and the liability to be easily overfitting with limited labelled training data of multimodal translation (MMT) is a critical issue in MMT.
no code implementations • WMT (EMNLP) 2021 • Daimeng Wei, Zongyao Li, Zhanglin Wu, Zhengzhe Yu, Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Minghan Wang, Lizhi Lei, Min Zhang, Hao Yang, Ying Qin
This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT 2021 News Translation Shared Task.
no code implementations • EMNLP 2020 • Lijie Wang, Ao Zhang, Kun Wu, Ke Sun, Zhenghua Li, Hua Wu, Min Zhang, Haifeng Wang
This paper describes in detail the construction process and data statistics of DuSQL.
1 code implementation • COLING 2022 • Nan Yu, Guohong Fu, Min Zhang
It is believed that speaker interactions are helpful for this task.
Ranked #2 on
Discourse Parsing
on STAC
1 code implementation • COLING 2022 • Zhongjian Miao, Xiang Li, Liyan Kang, Wen Zhang, Chulun Zhou, Yidong Chen, Bin Wang, Min Zhang, Jinsong Su
Most existing methods on robust neural machine translation (NMT) construct adversarial examples by injecting noise into authentic examples and indiscriminately exploit two types of examples.
no code implementations • ACL 2022 • Dengji Guo, Zhengrui Ma, Min Zhang, Yang Feng
Regularization methods applying input perturbation have drawn considerable attention and have been frequently explored for NMT tasks in recent years.
1 code implementation • ACL 2022 • Nan Yu, Meishan Zhang, Guohong Fu, Min Zhang
Pre-trained language models (PLMs) have shown great potentials in natural language processing (NLP) including rhetorical structure theory (RST) discourse parsing. Current PLMs are obtained by sentence-level pre-training, which is different from the basic processing unit, i. e. element discourse unit (EDU). To this end, we propose a second-stage EDU-level pre-training approach in this work, which presents two novel tasks to learn effective EDU representations continually based on well pre-trained language models. Concretely, the two tasks are (1) next EDU prediction (NEP) and (2) discourse marker prediction (DMP). We take a state-of-the-art transition-based neural parser as baseline, and adopt it with a light bi-gram EDU modification to effectively explore the EDU-level pre-trained EDU representation. Experimental results on a benckmark dataset show that our method is highly effective, leading a 2. 1-point improvement in F1-score. All codes and pre-trained models will be released publicly to facilitate future studies.
no code implementations • ACL 2022 • Ying Li, Shuaike Li, Min Zhang
To address this issue, we for the first time apply a dynamic matching network on the shared-private model for semi-supervised cross-domain dependency parsing.
no code implementations • Findings (ACL) 2022 • Kehai Chen, Masao Utiyama, Eiichiro Sumita, Rui Wang, Min Zhang
Machine translation typically adopts an encoder-to-decoder framework, in which the decoder generates the target sentence word-by-word in an auto-regressive manner.
no code implementations • Findings (ACL) 2022 • Yuxia Wang, Minghan Wang, Yimeng Chen, Shimin Tao, Jiaxin Guo, Chang Su, Min Zhang, Hao Yang
Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels due to its subjectivity.
no code implementations • IWSLT (ACL) 2022 • Minghan Wang, Jiaxin Guo, Yinglu Li, Xiaosong Qiao, Yuxia Wang, Zongyao Li, Chang Su, Yimeng Chen, Min Zhang, Shimin Tao, Hao Yang, Ying Qin
The cascade system is composed of a chunking-based streaming ASR model and the SimulMT model used in the T2T track.
no code implementations • IWSLT (ACL) 2022 • Minghan Wang, Jiaxin Guo, Xiaosong Qiao, Yuxia Wang, Daimeng Wei, Chang Su, Yimeng Chen, Min Zhang, Shimin Tao, Hao Yang, Ying Qin
For machine translation part, we pretrained three translation models on WMT21 dataset and fine-tuned them on in-domain corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • EMNLP (BlackboxNLP) 2021 • Minghan Wang, Guo Jiaxin, Yuxia Wang, Yimeng Chen, Su Chang, Hengchao Shang, Min Zhang, Shimin Tao, Hao Yang
Length prediction is a special task in a series of NAT models where target length has to be determined before generation.
1 code implementation • EMNLP 2021 • Xincheng Ju, Dong Zhang, Rong Xiao, Junhui Li, Shoushan Li, Min Zhang, Guodong Zhou
Therefore, in this paper, we are the first to jointly perform multi-modal ATE (MATE) and multi-modal ASC (MASC), and we propose a multi-modal joint learning approach with auxiliary cross-modal relation detection for multi-modal aspect-level sentiment analysis (MALSA).
no code implementations • EMNLP 2021 • Xinglin Lyu, Junhui Li, ZhengXian Gong, Min Zhang
In this paper we apply “one translation per discourse” in NMT, and aim to encourage lexical translation consistency for document-level NMT.
no code implementations • CCL 2021 • Mingyue Zhou, Chen Gong, Zhenghua Li, Min Zhang
“数据标注最重要的考虑因素是数据的质量和标注代价。我们调研发现自然语言处理领域的数据标注工作通常采用机标人校的标注方法以降低代价;同时, 很少有工作严格对比不同标注方法, 以探讨标注方法对标注质量和代价的影响。该文借助一个成熟的标注团队, 以依存句法数据标注为案例, 实验对比了机标人校、双人独立标注、及本文通过融合前两种方法所新提出的人机独立标注方法, 得到了一些初步的结论。”
1 code implementation • CoNLL (EMNLP) 2021 • Yang Hou, Houquan Zhou, Zhenghua Li, Yu Zhang, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan
In the coarse labeling stage, the joint model outputs a bracketed tree, in which each node corresponds to one of four labels (i. e., phrase, subphrase, word, subword).
1 code implementation • Findings (NAACL) 2022 • Huan Lin, Baosong Yang, Liang Yao, Dayiheng Liu, Haibo Zhang, Jun Xie, Min Zhang, Jinsong Su
Diverse NMT aims at generating multiple diverse yet faithful translations given a source sentence.
no code implementations • IWSLT (ACL) 2022 • Jiaxin Guo, Yinglu Li, Minghan Wang, Xiaosong Qiao, Yuxia Wang, Hengchao Shang, Chang Su, Yimeng Chen, Min Zhang, Shimin Tao, Hao Yang, Ying Qin
The paper presents the HW-TSC’s pipeline and results of Offline Speech to Speech Translation for IWSLT 2022.
no code implementations • SemEval (NAACL) 2022 • Yinglu Li, Min Zhang, Xiaosong Qiao, Minghan Wang
In order to verify whether our model could also perform better in subtask 2 (the regression subtask), the ranking score is transformed into classification labels by an up-sampling strategy.
no code implementations • SemEval (NAACL) 2022 • Xiaosong Qiao, Yinglu Li, Min Zhang, Minghan Wang, Hao Yang, Shimin Tao, Qin Ying
This paper describes the system for the identifying Plausible Clarifications of Implicit and Underspecified Phrases.
1 code implementation • Findings (EMNLP) 2021 • Qingrong Xia, Zhenghua Li, Rui Wang, Min Zhang
In particular, one recent seq-to-seq work directly fine-tunes AMR graph sequences on the encoder-decoder pre-trained language model and achieves new state-of-the-art results, outperforming previous works by a large margin.
no code implementations • Findings (EMNLP) 2021 • Ying Li, Meishan Zhang, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan
Thanks to the strong representation learning capability of deep learning, especially pre-training techniques with language model loss, dependency parsing has achieved great performance boost in the in-domain scenario with abundant labeled training data for target domains.
1 code implementation • 12 Jun 2025 • Haoyuan Shi, Yunxin Li, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang
Despite rapid advancements in video generation models, generating coherent storytelling videos that span multiple scenes and characters remains challenging.
no code implementations • 11 Jun 2025 • Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, Min Zhang
However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance.
1 code implementation • 11 Jun 2025 • Zhenran Xu, Yiyu Wang, Xue Yang, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
Starting with our curated dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning data, including node selection, workflow planning, and code-level workflow representation.
1 code implementation • 5 Jun 2025 • Zhenran Xu, Xue Yang, Yiyu Wang, Qingli Hu, Zijiao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
We introduce ComfyUI-Copilot, a large language model-powered plugin designed to enhance the usability and efficiency of ComfyUI, an open-source platform for AI-driven art creation.
no code implementations • 2 Jun 2025 • Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang
Stickers, though small, are a highly condensed form of visual expression, ubiquitous across messaging platforms and embraced by diverse cultures, genders, and age groups.
no code implementations • 28 May 2025 • Haomiao Qiu, Miao Zhang, Ziyue Qiao, Weili Guan, Min Zhang, Liqiang Nie
Informed by this analysis, we then introduce an effective method that derives the optimal partition of the gradient space for previously learned tasks.
no code implementations • 28 May 2025 • Yifan Lu, Jing Li, Yigeng Zhou, Yihui Zhang, Wenya Wang, Xiucheng Li, Meishan Zhang, Fangming Liu, Jun Yu, Min Zhang
Experimental results on multiple LLMs demonstrate that our ToxEdit outperforms previous state-of-the-art methods in both detoxification performance and safeguarding general capabilities of LLMs.
no code implementations • 26 May 2025 • Yu Shang, Peijie Liu, Yuwei Yan, Zijing Wu, Leheng Sheng, Yuanqing Yu, Chumeng Jiang, An Zhang, Fengli Xu, Yu Wang, Min Zhang, Yong Li
The emergence of agentic recommender systems powered by Large Language Models (LLMs) represents a paradigm shift in personalized recommendations, leveraging LLMs' advanced reasoning and role-playing capabilities to enable autonomous, adaptive decision-making.
1 code implementation • 26 May 2025 • Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Jun Rao, Min Zhang
Meanwhile, online reinforcement learning mainly adopts a length reward to encourage short reasoning responses, but tends to lose the reflection ability and harm the performance.
1 code implementation • 25 May 2025 • Yunxin Li, Xinyu Chen, Zitao Li, Zhenyu Liu, Longyue Wang, Wenhan Luo, Baotian Hu, Min Zhang
Applying Reinforcement Learning (RL) to Video Large Language Models (Video-LLMs) shows significant promise for complex video reasoning.
no code implementations • 24 May 2025 • Guodong Du, Zitao Fang, Jing Li, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Honghai Liu, Min Zhang
Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS-Pruning) for slimming down fine-tuned models.
1 code implementation • 24 May 2025 • Guodong Du, Xuanning Zhou, Junlin Li, Zhuo Li, Zesheng Shi, WanYu Lin, Ho-Kin Tang, Xiucheng Li, Fangming Liu, Wenya Wang, Min Zhang, Jing Li
The resulting SkillPack serves as a compact and transferable knowledge carrier, ideal for heterogeneous model fusion and continual learning.
1 code implementation • 22 May 2025 • Weiyang Guo, Jing Li, Wenya Wang, Yu Li, Daojing He, Jun Yu, Min Zhang
In the adversarial iterative optimization stage, the red-team model and the target model continuously improve their respective capabilities in interaction.
no code implementations • 22 May 2025 • Jun Rao, Xuebo Liu, Hexuan Deng, Zepeng Lin, Zixiong Yu, Jiansheng Wei, Xiaojun Meng, Min Zhang
In the realm of data selection for reasoning tasks, existing approaches predominantly rely on externally predefined static metrics such as difficulty and diversity, which are often designed for supervised fine-tuning (SFT) and lack adaptability to continuous training processes.
1 code implementation • 22 May 2025 • Bin Xie, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, Liqiang Nie
GUI automation faces critical challenges in dynamic environments.
no code implementations • 21 May 2025 • Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, YaoWei Wang, Min Zhang
By subtracting the machine-like patterns from the human-like distribution during the decoding process, CoPA is able to produce sentences that are less discernible by text detectors.
no code implementations • 19 May 2025 • Han Sun, Zhen Sun, Zongmin Zhang, Linzhao Jia, Wei Shao, Min Zhang
Large Language Models (LLMs) are emerging as dominant forces for textual style transfer.
1 code implementation • 19 May 2025 • Zhengrui Ma, Yang Feng, Chenze Shao, Fandong Meng, Jie zhou, Min Zhang
We introduce SLED, an alternative approach to speech language modeling by encoding speech waveforms into sequences of continuous latent representations and modeling them autoregressively using an energy distance objective.
no code implementations • 19 May 2025 • Jikai Wang, Zhenxu Tian, Juntao Li, Qingrong Xia, Xinyu Duan, Zhefeng Wang, Baoxing Huai, Min Zhang
The success of these methods relies on the alignment between draft candidates and the sampled outputs of the target model.
1 code implementation • 18 May 2025 • Yuyang Ding, Dan Qiao, Juntao Li, Jiajie Xu, Pingfu Chao, Xiaofang Zhou, Min Zhang
Distantly supervised named entity recognition (DS-NER) has emerged as a cheap and convenient alternative to traditional human annotation methods, enabling the automatic generation of training data by aligning text with external resources.
1 code implementation • 16 May 2025 • Yi Su, Yuechi Zhou, Quantong Qiu, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang
KV Cache quantization presents a promising solution, striking a good balance between memory usage and accuracy.
no code implementations • 12 May 2025 • Tongxu Luo, Wenyu Du, Jiaxi Bi, Stephen Chung, Zhengyang Tang, Hao Yang, Min Zhang, Benyou Wang
Notably, our fine-tuned LeaP-T-7B matches the performance of DeepSeek-R1-Distill-Qwen-14B on AIME 2024.
1 code implementation • 8 May 2025 • Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang
Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integrating modalities such as text, images, audio, and video to support complex reasoning capabilities and aiming to achieve comprehensive perception, precise understanding, and deep reasoning.
no code implementations • 7 May 2025 • Yanyu Li, Pencheng Wan, Liang Han, YaoWei Wang, Liqiang Nie, Min Zhang
Stable Diffusion has advanced text-to-image synthesis, but training models to generate images with accurate object quantity is still difficult due to the high computational cost and the challenge of teaching models the abstract concept of quantity.
no code implementations • 5 May 2025 • Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He
This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic "slow thinking" - a reasoning process inspired by human cognition, as described in Kahneman's Thinking, Fast and Slow.
1 code implementation • 28 Apr 2025 • Ranran Zhen, Juntao Li, Yixin Ji, Zhenlin Yang, Tong Liu, Qingrong Xia, Xinyu Duan, Zhefeng Wang, Baoxing Huai, Min Zhang
Finally, we outline potential research directions to further advance the field of LLM inference serving.
1 code implementation • 27 Apr 2025 • Jikai Wang, Juntao Li, Lijun Wu, Min Zhang
The proposed thinking behavior alignment improves the efficiency of drafting and the draft selection strategy maintains the prediction accuracy for complex problems.
1 code implementation • 23 Apr 2025 • Xinyu Chen, Yunxin Li, Haoyuan Shi, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang
Assessing the video comprehension capabilities of multimodal AI systems can effectively measure their understanding and reasoning abilities.
1 code implementation • 9 Apr 2025 • Jifang Wang, Xue Yang, Longyue Wang, Zhenran Xu, Yiyu Wang, YaoWei Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks.
1 code implementation • 7 Apr 2025 • Xinglin Lyu, Wei Tang, Yuang Li, Xiaofeng Zhao, Ming Zhu, Junhui Li, Yunfei Lu, Daimeng Wei, Hao Yang, Min Zhang
Despite the increased discourse challenges introduced by noise from automatic speech recognition (ASR), the integration of document-level context in speech translation (ST) remains insufficiently explored.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+8
1 code implementation • 5 Apr 2025 • Zhiyu He, Zhixin Ling, Jiayu Li, Zhiqiang Guo, Weizhi Ma, Xinchen Luo, Min Zhang, Guorui Zhou
In contrast, our research focuses on segment-level user interest modeling, which is crucial for understanding how users' preferences evolve during video browsing.
no code implementations • 1 Apr 2025 • Min Zhang, Yuzhe Lu, Yun Zhou, Panpan Xu, Lin Lee Cheong, Chang-Tien Lu, Haozhu Wang
Furthermore, our method improves accuracy by 16. 2% - 43. 6% while reducing data leakage by 2. 3% - 44. 6% compared to existing data protection approaches.
no code implementations • 31 Mar 2025 • Yi Su, Dian Yu, Linfeng Song, Juntao Li, Haitao Mi, Zhaopeng Tu, Min Zhang, Dong Yu
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are accessible for verification.
no code implementations • 30 Mar 2025 • Miaomiao Cai, Lei Chen, Yifan Wang, Zhiyong Cheng, Min Zhang, Meng Wang
Existing supervised alignment and reweighting methods mitigate this bias but have key limitations: (1) ignoring inherent variability across Graph Convolutional Networks (GCNs) layers, causing negative effects in deeper layers; (2) reliance on fixed hyperparameters to balance item popularity, restricting adaptability and increasing complexity.
1 code implementation • CVPR 2025 • Yunhong Lu, Qichao Wang, Hengyuan Cao, Xierui Wang, Xiaoyin Xu, Min Zhang
To address these limitations, we introduce DDIM-InPO, an efficient method for direct preference alignment of diffusion models.
1 code implementation • 24 Mar 2025 • Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, Min Zhang
Multi-agent systems (MAS) based on large language models (LLMs) have demonstrated significant potential in collaborative problem-solving.
no code implementations • 18 Mar 2025 • Zhengsheng Guo, Linwei Zheng, Xinyang Chen, Xuefeng Bai, Kehai Chen, Min Zhang
While human cognition inherently retrieves information from diverse and specialized knowledge sources during decision-making processes, current Retrieval-Augmented Generation (RAG) systems typically operate through single-source knowledge retrieval, leading to a cognitive-algorithmic discrepancy.
no code implementations • 13 Mar 2025 • Henglyu Liu, Andong Chen, Kehai Chen, Xuefeng Bai, Meizhi Zhong, Yuan Qiu, Min Zhang
Recent advancement of large language models (LLMs) has led to significant breakthroughs across various tasks, laying the foundation for the development of LLM-based speech translation systems.
no code implementations • 13 Mar 2025 • Qiyuan Deng, Xuefeng Bai, Kehai Chen, YaoWei Wang, Liqiang Nie, Min Zhang
Reinforcement Learning (RL) algorithms for safety alignment of Large Language Models (LLMs), such as Direct Preference Optimization (DPO), encounter the challenge of distribution shift.
1 code implementation • 13 Mar 2025 • Zhenyu Liu, Dongfang Li, Xinshuo Hu, Xinping Zhao, Yibin Chen, Baotian Hu, Min Zhang
We find that the transformer embeds the task function learned from demonstrations into the separator token representation, which plays an important role in the generation of prior response tokens.
no code implementations • 10 Mar 2025 • Zhenyu Li, Kehai Chen, Yunfei Long, Xuefeng Bai, Yaoyin Zhang, Xuchen Wei, Juntao Li, Min Zhang
Large Language Models (LLMs) have demonstrated remarkable instruction-following capabilities across various applications.
no code implementations • 6 Mar 2025 • Yu Pan, Chaozheng Wang, Zekai Wu, Qifan Wang, Min Zhang, Zenglin Xu
Addressing this concern, we introduce fully identical initialization (IDInit), a novel method that preserves identity in both the main and sub-stem layers of residual networks.
1 code implementation • 4 Mar 2025 • Xingzuo Li, Kehai Chen, Yunfei Long, Xuefeng Bai, Yong Xu, Min Zhang
Large language model (LLM) agents typically adopt a step-by-step reasoning framework, in which they interleave the processes of thinking and acting to accomplish the given task.
no code implementations • 28 Feb 2025 • Yihong Tang, Kehai Chen, Xuefeng Bai, ZhengYu Niu, Bo wang, Jie Liu, Min Zhang
Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations.
1 code implementation • 27 Feb 2025 • Zhenyu Liu, Yunxin Li, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang
Specifically, our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.
1 code implementation • 26 Feb 2025 • Jingtao Zhan, Jiahao Zhao, Jiayu Li, Yiqun Liu, Bo Zhang, Qingyao Ai, Jiaxin Mao, Hongning Wang, Min Zhang, Shaoping Ma
When the expectation and variance of failure counts are both finite, it signals the ability to consistently find solutions to new challenges, which we define as the Autonomous Level of intelligence.
no code implementations • 26 Feb 2025 • Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Qinglang Guo, Min Zhang
Our in-depth experiments, both quantitative and qualitative, demonstrate the dataset's potential in modeling user behavior and personalized recommendation systems, opening up new possibilities for research in personalized retrieval and conversational AI.
1 code implementation • 25 Feb 2025 • Zhuocheng Zhang, Yang Feng, Min Zhang
In LevelRAG, the high-level searcher orchestrates the retrieval logic, while the low-level searchers (sparse, web, and dense) refine the queries for optimal retrieval.
no code implementations • 25 Feb 2025 • Zhiyu Yin, Kehai Chen, Xuefeng Bai, Ruili Jiang, Juntao Li, Hongdong Li, Jin Liu, Yang Xiang, Jun Yu, Min Zhang
Video generation, by leveraging a dynamic visual generation method, pushes the boundaries of Artificial Intelligence Generated Content (AIGC).
1 code implementation • 21 Feb 2025 • Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, Min Zhang
To address this issue, we introduce the task of General Chinese Character Error Correction (C2EC), which focuses on all three types of character errors.
no code implementations • 21 Feb 2025 • Zetian Sun, Dongfang Li, Baotian Hu, Jun Yu, Min Zhang
In the Large Language Model(LLM) reasoning scenario, people often estimate state value via Monte Carlo sampling.
no code implementations • 21 Feb 2025 • Weiqiao Shan, Yuang Li, Yuhao Zhang, Yingfeng Luo, Chen Xu, Xiaofeng Zhao, Long Meng, Yunfei Lu, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu
Connecting audio encoders with large language models (LLMs) allows the LLM to perform various audio understanding tasks, such as automatic speech recognition (ASR) and audio captioning (AC).
1 code implementation • 20 Feb 2025 • Pinzheng Wang, Zecheng Tang, Keyan Zhou, Juntao Li, Qiaoming Zhu, Min Zhang
Large Language Models have demonstrated superior performance across a wide range of tasks, but they still exhibit undesirable errors due to incorrect knowledge learned from the training data.
1 code implementation • 18 Feb 2025 • Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, YaoWei Wang, Min Zhang
To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1. 61, which enables weight quantization to 1. 61-bit for the first time.
1 code implementation • 18 Feb 2025 • Xin Zhang, Ziqi Dai, Yongqi Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Jun Yu, Wenjie Li, Min Zhang
In this work, we introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences, and the model is required to understand the semantics from the interleaved context for effective retrieval.
1 code implementation • 18 Feb 2025 • Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, YaoWei Wang, Min Zhang, Liqiang Nie
Then, we conduct extensive experiments with the baseline within each class, covering models with various sizes (7B-70B), bitwidths, training levels (LLaMA1/2/3/3. 1), architectures (Mixtral, DeepSeekMoE and Mamba) and modality (LLaVA1. 5 and VILA1. 5) on a wide range of evaluation metrics. Through comparative analysis on the results, we summarize the superior of each PTQ strategy and modelsize-bitwidth trade-off considering the performance.
no code implementations • 17 Feb 2025 • Hongbin Zhang, Kehai Chen, Xuefeng Bai, Xiucheng Li, Min Zhang
Large language models (LLMs) have succeeded remarkably in multilingual translation tasks.
no code implementations • 17 Feb 2025 • Andong Chen, Yuchen Song, Wenxin Zhu, Kehai Chen, Muyun Yang, Tiejun Zhao, Min Zhang
The o1-Like LLMs are transforming AI by simulating human cognitive processes, but their performance in multilingual machine translation (MMT) remains underexplored.
no code implementations • 16 Feb 2025 • Yuxin Liu, Zhenxi Song, Guoyang Xu, ZiRui Wang, Feng Wan, Yong Hu, Min Zhang, Zhiguo Zhang
Brain-computer interface (BCI) based on steady-state visual evoked potentials (SSVEP) is a popular paradigm for its simplicity and high information transfer rate (ITR).
no code implementations • 14 Feb 2025 • Dilrukshi Gamage, Dilki Sewwandi, Min Zhang, Arosha Bandara
However their trust in the label significantly varied based on the label design.
1 code implementation • 9 Feb 2025 • Huiyao Chen, Meishan Zhang, Jing Li, Min Zhang, Lilja Øvrelid, Jan Hajič, Hao Fei
Semantic role labeling (SRL) is a central natural language processing (NLP) task aiming to understand the semantic roles within texts, facilitating a wide range of downstream applications.
1 code implementation • 22 Jan 2025 • Zhenran Xu, Longyue Wang, Jifang Wang, Zhouyi Li, Senbao Shi, Xue Yang, Yiyu Wang, Baotian Hu, Jun Yu, Min Zhang
Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions.
1 code implementation • 14 Jan 2025 • Weiqiao Shan, Yuhao Zhang, Yuchen Han, Bei Li, Xiaofeng Zhao, Yuang Li, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu
Recent advancements have highlighted the efficacy of self-supervised learning (SSL) features in various speech-related tasks, providing lightweight and versatile multi-view speech representations.
no code implementations • 14 Jan 2025 • ZiRui Wang, Zhenxi Song, Yi Guo, Yuxin Liu, Guoyang Xu, Min Zhang, Zhiguo Zhang
The development of EEG decoding algorithms confronts challenges such as data sparsity, subject variability, and the need for precise annotations, all of which are vital for advancing brain-computer interfaces and enhancing the diagnosis of diseases.
2 code implementations • 9 Jan 2025 • Xiaojie Li, Yibo Yang, Jianlong Wu, David A. Clifton, Yue Yu, Bernard Ghanem, Min Zhang
To this end, we propose Continuous Knowledge-Preserving Decomposition for FSCIL (CKPD-FSCIL), a framework that decomposes a model's weights into two parts: one that compacts existing knowledge (knowledge-sensitive components) and another that carries redundant capacity to accommodate new abilities (redundant-capacity components).
class-incremental learning
Few-Shot Class-Incremental Learning
+1
no code implementations • 9 Jan 2025 • Wei Tang, Jiawei Yu, Yuang Li, Yanqing Zhao, Weidong Zhang, Wei Feng, Min Zhang, Hao Yang
The inaccurate translation of numbers can lead to significant security issues, ranging from financial setbacks to medical inaccuracies.
no code implementations • 6 Jan 2025 • Qingyao Ai, Zhicheng Dou, Min Zhang
In this chapter, we discuss how to improve the GenIR systems based on user feedback.
1 code implementation • 5 Jan 2025 • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang
In System-2 models, it enhances the model's reasoning ability to solve complex problems through repeated sampling, self-correction, and tree search.
1 code implementation • 2 Jan 2025 • Xinshuo Hu, Zifei Shan, Xinping Zhao, Zetian Sun, Zhenyu Liu, Dongfang Li, Shaolin Ye, Xinyuan Wei, Qian Chen, Baotian Hu, Haofen Wang, Jun Yu, Min Zhang
As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial.
no code implementations • CVPR 2025 • Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang
Last, we provide in-depth analyses of model scaling and training strategies, and perform ablation studies on both the model and synthetic data.
no code implementations • 30 Dec 2024 • Min Zhang, Zilin Wang, Liyan Chen, KunHong Liu, Juncong Lin
Recent advances in AI-driven storytelling have enhanced video generation and story visualization.
no code implementations • 28 Dec 2024 • Zhaohui Wang, Jingran Yang, Bojie Shao, Min Zhang
Some efficient white-box fairness testing methods about individual fairness have been proposed.
no code implementations • 26 Dec 2024 • Jiawei Yu, Xiang Geng, Yuang Li, Mengxin Ren, Wei Tang, Jiahuan Li, Zhibin Lan, Min Zhang, Hao Yang, ShuJian Huang, Jinsong Su
Spoken named entity recognition (NER) aims to identify named entities from speech, playing an important role in speech processing.
1 code implementation • 24 Dec 2024 • Xinping Zhao, Baotian Hu, Yan Zhong, Shouzheng Huang, Zihao Zheng, Meng Wang, Haofen Wang, Min Zhang
Although prevailing supervised and self-supervised learning (SSL)-augmented sequential recommendation (SeRec) models have achieved improved performance with powerful neural network architectures, we argue that they still suffer from two limitations: (1) Preference Drift, where models trained on past data can hardly accommodate evolving user preference; and (2) Implicit Memory, where head patterns dominate parametric learning, making it harder to recall long tails.
Ranked #2 on
Sequential Recommendation
on Amazon-Beauty
no code implementations • 22 Dec 2024 • Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang
Last, we provide in-depth analyses of model scaling and training strategies, and perform ablation studies on both the model and synthetic data.
no code implementations • 19 Dec 2024 • Xiabin Zhou, Wenbin Wang, Minyan Zeng, Jiaxian Guo, Xuebo Liu, Li Shen, Min Zhang, Liang Ding
Efficient KV cache management in LLMs is crucial for long-context tasks like RAG and summarization.
1 code implementation • 18 Dec 2024 • Yifan Lu, Yigeng Zhou, Jing Li, Yequan Wang, Xuebo Liu, Daojing He, Fangming Liu, Min Zhang
Multi-hop question answering (MHQA) poses a significant challenge for large language models (LLMs) due to the extensive knowledge demands involved.
1 code implementation • 18 Dec 2024 • Yingjie Zhu, Xuefeng Bai, Kehai Chen, Yang Xiang, Min Zhang
To study the reason behind these limitations, we propose VGCure, a comprehensive benchmark covering 22 tasks for examining the fundamental graph understanding and reasoning capacities of LVLMs.
no code implementations • 17 Dec 2024 • Jiaqi Wang, Liutao Yu, Liwei Huang, Chenlin Zhou, Han Zhang, Zhenxi Song, Min Zhang, Zhengyu Ma, Zhiguo Zhang
Furthermore, by executing KDCL, we reduce the number of time steps by 60% and decrease energy consumption by 54. 8% while maintaining comparable performance to recent SOTA results.
no code implementations • 17 Dec 2024 • Ziheng Qiao, Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang
One key characteristic of the Chinese spelling check (CSC) task is that incorrect characters are usually similar to the correct ones in either phonetics or glyph.
no code implementations • 17 Dec 2024 • Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang
Large language models (LLMs) have demonstrated impressive multilingual understanding and reasoning capabilities, driven by extensive pre-training multilingual corpora and fine-tuning instruction data.
no code implementations • 17 Dec 2024 • Andong Chen, Yuchen Song, Kehai Chen, Muyun Yang, Tiejun Zhao, Min Zhang
Visual information has been introduced for enhancing machine translation (MT), and its effectiveness heavily relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations.
no code implementations • 17 Dec 2024 • Mufan Xu, Kehai Chen, Xuefeng Bai, Muyun Yang, Tiejun Zhao, Min Zhang
Large language models (LLMs) based on generative pre-trained Transformer have achieved remarkable performance on knowledge graph question-answering (KGQA) tasks.
1 code implementation • 12 Dec 2024 • Pan Zhang, Xiaoyi Dong, Yuhang Cao, Yuhang Zang, Rui Qian, Xilin Wei, Lin Chen, Yifei Li, Junbo Niu, Shuangrui Ding, Qipeng Guo, Haodong Duan, Xin Chen, Han Lv, Zheng Nie, Min Zhang, Bin Wang, Wenwei Zhang, Xinyue Zhang, Jiaye Ge, Wei Li, Jingwen Li, Zhongying Tu, Conghui He, Xingcheng Zhang, Kai Chen, Yu Qiao, Dahua Lin, Jiaqi Wang
Recent advancements in multimodal large language models (MLLMs) have made significant strides in open-world understanding.
no code implementations • 12 Dec 2024 • Meizhi Zhong, Xikai Liu, Chen Zhang, Yikun Lei, Yan Gao, Yao Hu, Kehai Chen, Min Zhang
To accelerate the inference of LLMs, storing computed caches in memory has become the standard technique.
1 code implementation • 10 Dec 2024 • Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Rongxiang Weng, Muyun Yang, Tiejun Zhao, Min Zhang
This vulnerability poses significant risks to the real-world applications.
no code implementations • 10 Dec 2024 • Dongfang Li, Zetian Sun, Xinshuo Hu, Baotian Hu, Min Zhang
Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences.
1 code implementation • 4 Dec 2024 • Yunkai Dang, Min Zhang, Zhengyu Chen, Xinliang Zhang, Zheng Wang, Meijun Sun, Donglin Wang
In this paper, we argue that measure at such a level may not be effective enough to generalize from base to novel classes when using only a few images.
1 code implementation • 3 Dec 2024 • Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Erik Cambria, Min Zhang, Hao Fei
T3DEM is the most crucial step in determining the quality of Emo3D generation and encompasses three key challenges: Expression Diversity, Emotion-Content Consistency, and Expression Fluidity.
1 code implementation • 26 Nov 2024 • Zhengrui Ma, Yang Feng, Min Zhang
Streaming generation models are increasingly utilized across various fields, with the Transducer architecture being particularly popular in industrial applications.
1 code implementation • 21 Nov 2024 • Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu
To address this, we propose DRPruning, which incorporates distributionally robust optimization to restore balanced performance across domains, along with further improvements to enhance robustness.
no code implementations • 20 Nov 2024 • Jiawei Yu, Yuang Li, Xiaosong Qiao, Huan Zhao, Xiaofeng Zhao, Wei Tang, Min Zhang, Hao Yang, Jinsong Su
Existing research primarily utilizes additional text data and predefined speech styles supported by TTS models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 9 Nov 2024 • Jiayin Wang, XiaoYu Zhang, Weizhi Ma, Min Zhang
Firstly, we train an autoencoder with sparsity constraints to reconstruct internal activations of recommendation models, making the RecSAE latents more interpretable and monosemantic than the original neuron activations.
1 code implementation • 1 Nov 2024 • Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai Wang, Chuhan Wu, Min Zhang
We intend our evaluation framework and observations to benefit future research on the use of LLMs as recommenders.
1 code implementation • 29 Oct 2024 • Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang
The online retrieval part follows the paradigm of relevant recall and personalized ranking, supported by the offline pre-calculation parts, which are sticker semantic understanding, utility evaluation and personalization modules.
1 code implementation • 28 Oct 2024 • Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu
Despite their remarkable abilities in various tasks, large language models (LLMs) still struggle with real-time information (e. g., new facts and terms) due to the knowledge cutoff in their development process.
1 code implementation • 24 Oct 2024 • Zecheng Tang, Zechen Sun, Juntao Li, Qiaoming Zhu, Min Zhang
To overcome the GPU memory-bound issue caused by the long sequence, LOGO employs a reference-free preference optimization strategy and adopts a position synthesis method to construct the training data.
1 code implementation • 24 Oct 2024 • Yuyang Ding, Xinyu Shi, Xiaobo Liang, Juntao Li, Qiaoming Zhu, Min Zhang
The availability of high-quality data is one of the most important factors in improving the reasoning capability of LLMs.
no code implementations • 23 Oct 2024 • Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang
As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency.
1 code implementation • 21 Oct 2024 • Wangjie You, Zecheng Tang, Juntao Li, Lili Yao, Min Zhang
Large language models (LLMs) have advanced significantly due to the attention mechanism, but their quadratic complexity and linear memory demands limit their performance on long-context tasks.
no code implementations • 20 Oct 2024 • Yu Zhao, Hao Fei, Xiangtai Li, Libo Qin, Jiayi Ji, Hongyuan Zhu, Meishan Zhang, Min Zhang, Jianguo Wei
In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form.
no code implementations • 19 Oct 2024 • Jilong Li, Zhenxi Song, Jiaqi Wang, Meishan Zhang, Honghai Liu, Min Zhang, Zhiguo Zhang
Current EEG/MEG-to-text decoding systems suffer from three key limitations: (1) reliance on teacher-forcing methods, which compromises robustness during inference, (2) sensitivity to session-specific noise, hindering generalization across subjects, and (3) misalignment between brain signals and linguistic representations due to pre-trained language model over-dominance.
no code implementations • 18 Oct 2024 • Chen Zhang, Meizhi Zhong, Qimeng Wang, Xuantao Lu, Zheyu Ye, Chengqiang Lu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang, Dawei Song
Long-context efficiency has recently become a trending topic in serving large language models (LLMs).
no code implementations • 16 Oct 2024 • Andong Chen, Kehai Chen, Yang Xiang, Xuefeng Bai, Muyun Yang, Yang Feng, Tiejun Zhao, Min Zhang
The remarkable understanding and generation capabilities of large language models (LLMs) have greatly improved translation performance.
no code implementations • 16 Oct 2024 • Junjie Chen, Weihang Su, Zhumin Chu, Haitao Li, Qinyao Ai, Yiqun Liu, Min Zhang, Shaoping Ma
Moreover, our study highlights the impact of prompt strategies and evaluation formats on evaluation performance, offering guidance for method optimization in the future.
no code implementations • 15 Oct 2024 • Xinping Zhao, Dongfang Li, Yan Zhong, Boren Hu, Yibin Chen, Baotian Hu, Min Zhang
Recent studies in Retrieval-Augmented Generation (RAG) have investigated extracting evidence from retrieved passages to reduce computational costs and enhance the final RAG performance, yet it remains challenging.
no code implementations • 14 Oct 2024 • Xinping Zhao, Jindi Yu, Zhenyu Liu, Jifang Wang, Dongfang Li, Yibin Chen, Baotian Hu, Min Zhang
Therefore, it is necessary to resort to external knowledge to detect and correct the hallucinated content.
no code implementations • 14 Oct 2024 • Xinping Zhao, Yan Zhong, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Dongfang Li, Baotian Hu, Min Zhang
In this work, we propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG, termed FunnelRAG, so as to balance effectiveness and efficiency.
1 code implementation • 10 Oct 2024 • Yutong Wang, Jiali Zeng, Xuebo Liu, Derek F. Wong, Fandong Meng, Jie zhou, Min Zhang
Large language models (LLMs) have achieved reasonable quality improvements in machine translation (MT).
1 code implementation • 10 Oct 2024 • Yuanqing Yu, Zhefan Wang, Weizhi Ma, Zhicheng Guo, Jingtao Zhan, Shuai Wang, Chuhan Wu, Zhiqiang Guo, Min Zhang
Despite having powerful reasoning and inference capabilities, Large Language Models (LLMs) still need external tools to acquire real-time information retrieval or domain-specific expertise to solve complex tasks, which is referred to as tool learning.
no code implementations • 8 Oct 2024 • Siqi Wang, Zhengyu Chen, Bei Li, Keqing He, Min Zhang, Jingang Wang
The scaling of large language models (LLMs) is a critical research area for the efficiency and effectiveness of model training and deployment.
1 code implementation • 5 Oct 2024 • Houquan Zhou, Zhenghua Li, Bo Zhang, Chen Li, Shaopeng Lai, Ji Zhang, Fei Huang, Min Zhang
This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches.
1 code implementation • 4 Oct 2024 • Tengfei Yu, Xuebo Liu, Zhiyi Hou, Liang Ding, DaCheng Tao, Min Zhang
This study aims to refine the use of speech datasets for LSM training by addressing the limitations of vanilla instruction tuning.
1 code implementation • 4 Oct 2024 • Jun Rao, Xuebo Liu, Lian Lian, Shengjun Cheng, Yunjie Liao, Min Zhang
With instruction tuning, Large Language Models (LLMs) can enhance their ability to adhere to commands.
1 code implementation • 3 Oct 2024 • Guodong Du, Junlin Lee, Jing Li, Runhua Jiang, Yifei Guo, Shuyang Yu, Hanting Liu, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Min Zhang
Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model.
2 code implementations • 3 Oct 2024 • Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang
Long-context models (LCMs) have made remarkable strides in recent years, offering users great convenience for handling tasks that involve long context, such as document summarization.
1 code implementation • 2 Oct 2024 • Yu Zhang, Kehai Chen, Xuefeng Bai, Zhao Kang, Quanjiang Guo, Min Zhang
Knowledge graph question answering (KGQA) involves answering natural language questions by leveraging structured information stored in a knowledge graph.
1 code implementation • 1 Oct 2024 • Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Xinbei Ma, Muyun Yang, Tiejun Zhao, Min Zhang
However, a key challenge lies in devising effective plans to guide action prediction in GUI tasks, though planning have been widely recognized as effective for decomposing complex tasks into a series of steps.
no code implementations • 1 Oct 2024 • Yu Zhao, Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua
Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics.
1 code implementation • 30 Sep 2024 • Can Cui, Siteng Huang, Wenxuan Song, Pengxiang Ding, Min Zhang, Donglin Wang
To address the occlusion issues in person Re-Identification (ReID) tasks, many methods have been proposed to extract part features by introducing external spatial information.
no code implementations • 20 Sep 2024 • Yuang Li, Xiaosong Qiao, Xiaofeng Zhao, Huan Zhao, Wei Tang, Min Zhang, Hao Yang
Large language models can enhance automatic speech recognition systems through generative error correction.
no code implementations • 19 Sep 2024 • Jun Rao, Xuebo Liu, Zepeng Lin, Liang Ding, Jing Li, DaCheng Tao, Min Zhang
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
no code implementations • 13 Sep 2024 • Shaojun Li, Hengchao Shang, Daimeng Wei, Jiaxin Guo, Zongyao Li, Xianghui He, Min Zhang, Hao Yang
Recent advancements in integrating speech information into large language models (LLMs) have significantly improved automatic speech recognition (ASR) accuracy.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 11 Sep 2024 • Yang Liu, Pengxiang Ding, Siteng Huang, Min Zhang, Han Zhao, Donglin Wang
Fueled by the Large Language Models (LLMs) wave, Large Visual-Language Models (LVLMs) have emerged as a pivotal advancement, bridging the gap between image and text.
1 code implementation • 30 Aug 2024 • Guoyang Xu, Junqi Xue, Yuxin Liu, ZiRui Wang, Min Zhang, Zhenxi Song, Zhiguo Zhang
Multimodal sentiment analysis aims to learn representations from different modalities to identify human emotions.
1 code implementation • 30 Aug 2024 • Weijie Liu, Zecheng Tang, Juntao Li, Kehai Chen, Min Zhang
This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval.
no code implementations • 30 Aug 2024 • Fengyuan Dai, Siteng Huang, Min Zhang, Biao Gong, Donglin Wang
To transfer knowledge from seen attribute-object compositions to recognize unseen ones, recent compositional zero-shot learning (CZSL) methods mainly discuss the optimal classification branches to identify the elements, leading to the popularity of employing a three-branch architecture.
no code implementations • 26 Aug 2024 • Zelin Li, Kehai Chen, Lemao Liu, Xuefeng Bai, Mingming Yang, Yang Xiang, Min Zhang
In this paper, we analyze the core mechanisms of previous predominant adversarial attack methods, revealing that 1) the distributions of importance score differ markedly among victim models, restricting the transferability; 2) the sequential attack processes induces substantial time overheads.
1 code implementation • 25 Aug 2024 • Xingzuo Li, Kehai Chen, Yunfei Long, Min Zhang
Large language models (LLMs) have created a new paradigm for natural language processing.
1 code implementation • 22 Aug 2024 • Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng
Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge.
no code implementations • 19 Aug 2024 • Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang
Different from the traditional translation tasks, classical Chinese poetry translation requires both adequacy and fluency in translating culturally and historically significant content and linguistic poetic elegance.
1 code implementation • 19 Aug 2024 • Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang
These images are designed to maintain visual consistency across different scenes using a visual-language prompting method that combines scene descriptions and images of the appearing character and setting.
1 code implementation • 16 Aug 2024 • Peiming Guo, Sinuo Liu, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang
We propose the first end-to-end model for photo-sharing multi-modal dialogue generation, which integrates an image perceptron and an image generator with a large language model.
no code implementations • 4 Aug 2024 • Zheng Li, Siyuan Wu, Ruichuan Chen, Paarijaat Aditya, Istemi Ekin Akkus, Manohar Vanga, Min Zhang, Hao Li, Yang Zhang
Machine learning (ML), driven by prominent paradigms such as centralized and federated learning, has made significant progress in various critical applications ranging from autonomous driving to face recognition.
no code implementations • 29 Jul 2024 • Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang
We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders).
1 code implementation • 21 Jul 2024 • Hao Li, Zheng Li, Siyuan Wu, Chengrui Hu, Yutong Ye, Min Zhang, Dengguo Feng, Yang Zhang
Building upon this signal, we introduce a novel attack method called Sequential-metric based Membership Inference Attack (SeqMIA).
1 code implementation • 19 Jul 2024 • Changyue Wang, Weihang Su, Hu Yiran, Qingyao Ai, Yueyue Wu, Cheng Luo, Yiqun Liu, Min Zhang, Shaoping Ma
Existing benchmarks for evaluating knowledge update methods are mostly designed for the open domain and cannot address the specific challenges of the legal domain, such as the nuanced application of new legal knowledge, the complexity and lengthiness of legal regulations, and the intricate nature of legal reasoning.
1 code implementation • 8 Jul 2024 • Xiaojie Li, Yibo Yang, Jianlong Wu, Bernard Ghanem, Liqiang Nie, Min Zhang
The dual design enables the model to maintain the robust features of base classes, while adaptively learning distinctive feature shifts for novel classes.
class-incremental learning
Few-Shot Class-Incremental Learning
+3
1 code implementation • 6 Jul 2024 • Min Zhang, Xian Fu, Jianye Hao, Peilong Han, Hao Zhang, Lei Shi, Hongyao Tang, Yan Zheng
To this end, based on the characteristics of embodied task planning, we first develop a systematic evaluation framework, which encapsulates four crucial capabilities of MFMs: object understanding, spatio-temporal perception, task understanding, and embodied reasoning.
1 code implementation • 3 Jul 2024 • Zhibin Lan, LiQiang Niu, Fandong Meng, Jie zhou, Min Zhang, Jinsong Su
Among them, the target text decoder is used to alleviate the language alignment burden, and the image tokenizer converts long sequences of pixels into shorter sequences of visual tokens, preventing the model from focusing on low-level visual features.
no code implementations • 3 Jul 2024 • Shengkun Wang, Taoran Ji, Jianfeng He, Mariam Almutairi, Dan Wang, Linhan Wang, Min Zhang, Chang-Tien Lu
This confirms the value of adversarial training in reducing stochasticity and bias for stock volatility prediction tasks.
no code implementations • 3 Jul 2024 • Zhongli Jiang, Dabao Zhang, Min Zhang
Tree ensemble methods provide promising predictions with models difficult to interpret.
no code implementations • 2 Jul 2024 • Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, Min Zhang, Wanxiang Che
However, when compressing tool documentation, existing methods suffer from the weaknesses of key information loss (specifically, tool/parameter name errors) and difficulty in adjusting the length of compressed sequences based on documentation lengths.
no code implementations • 27 Jun 2024 • Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan
Then, an SG-based framework is built, where the textual SG (TSG) is encoded with a graph Transformer, while the video dynamic SG (DSG) and the HSG are modeled with a novel recurrent graph Transformer for spatial and temporal feature propagation.
no code implementations • 26 Jun 2024 • Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang
We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios.
1 code implementation • 25 Jun 2024 • Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang
It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step.
1 code implementation • 25 Jun 2024 • Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, Min Zhang
Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently.
1 code implementation • 20 Jun 2024 • Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min Zhang, Yu Cheng
Therefore, we propose a crucial question: Can we build a universal framework to handle a variety of temporal reasoning tasks?
no code implementations • 19 Jun 2024 • Meizhi Zhong, Chen Zhang, Yikun Lei, Xikai Liu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang
Enabling LLMs to handle lengthy context is currently a research hotspot.
no code implementations • 17 Jun 2024 • Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang
In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.
1 code implementation • 17 Jun 2024 • Yunxin Li, Xinyu Chen, Baotian Hu, Longyue Wang, Haoyuan Shi, Min Zhang
Through a comprehensive and quantitative evaluation of cutting-edge models, we reveal that: 1) Video-LMMs face difficulties in fine-grained video tasks involving temporal location, object tracking, and anomaly detection; 2) Video-LMMs present inferior logical and relation reasoning abilities; 3) Open-source Video-LMMs' performance is significantly lower than GPT-4o and Gemini-1. 5, lagging by 20 points.
no code implementations • 16 Jun 2024 • Haiguang Wang, Yu Wu, Mengxia Wu, Cao Min, Min Zhang
This paper proposes the Local Alignment from Image-Phrase modeling (LAIP) framework, with Bidirectional Attention-weighted local alignment (BidirAtt) and Mask Phrase Modeling (MPM) module. BidirAtt goes beyond the typical forward attention by considering the gradient of the transformer as backward attention, utilizing two-sided information for local alignment.
1 code implementation • 13 Jun 2024 • Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Yan Bowen, Yu Cheng, Min Zhang
Temporal reasoning is fundamental for large language models (LLMs) to comprehend the world.
1 code implementation • 12 Jun 2024 • Yutong Wang, Jiali Zeng, Xuebo Liu, Fandong Meng, Jie zhou, Min Zhang
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
1 code implementation • 11 Jun 2024 • Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng
Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence.
1 code implementation • 11 Jun 2024 • Meizhi Zhong, Kehai Chen, Zhengshan Xue, Lemao Liu, Mingming Yang, Min Zhang
It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information.
1 code implementation • 11 Jun 2024 • Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang
Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation.
1 code implementation • 11 Jun 2024 • Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang
Simultaneous translation models play a crucial role in facilitating communication.
1 code implementation • 11 Jun 2024 • Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang
Large language models (LLMs) have showcased impressive multilingual machine translation ability.
1 code implementation • 11 Jun 2024 • Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng
Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences.
no code implementations • 11 Jun 2024 • Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng
Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results.
1 code implementation • 10 Jun 2024 • Yan Gao, Zhiwei Cao, Zhongjian Miao, Baosong Yang, Shiyu Liu, Min Zhang, Jinsong Su
In this paper, we first conduct a preliminary study to reveal two key limitations of $k$NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of $\lambda$ for determining $k$NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for $k$NN retrieval at different timesteps.
1 code implementation • 10 Jun 2024 • Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence.
1 code implementation • 5 Jun 2024 • Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng
Simultaneous speech-to-speech translation (Simul-S2ST, a. k. a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication.
Ranked #1 on
de-en
on CVSS
1 code implementation • 5 Jun 2024 • XiaoYu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie Sun, Min Zhang
We also provide further insights into combining human labels with the LLM evaluation process and utilizing ensembles of multiple heterogeneous LLM evaluators to enhance the accuracy and stability of evaluations.
no code implementations • 4 Jun 2024 • Junlin Lee, Yequan Wang, Jing Li, Min Zhang
Multimodal reasoning with large language models (LLMs) often suffers from hallucinations and the presence of deficient or outdated knowledge within LLMs.
1 code implementation • 3 Jun 2024 • Yi Su, Yunpeng Tai, Yixin Ji, Juntao Li, Bowen Yan, Min Zhang
Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates.
1 code implementation • 31 May 2024 • Miaomiao Cai, Lei Chen, Yifan Wang, Haoyue Bai, Peijie Sun, Le Wu, Min Zhang, Meng Wang
To alleviate popularity bias, existing efforts focus on emphasizing unpopular items or separating the correlation between item representations and their popularity.
1 code implementation • 28 May 2024 • Jiayu Li, Hanyu Li, Zhiyu He, Weizhi Ma, Peijie Sun, Min Zhang, Shaoping Ma
However, these libraries often impose certain restrictions on data and seldom support the same model to perform different tasks and input formats, limiting users from customized explorations.
no code implementations • 23 May 2024 • Weiqi Wu, Hongqiu Wu, Lai Jiang, XingYuan Liu, Jiale Hong, Hai Zhao, Min Zhang
Drama is a form of storytelling inspired by human creativity, proceeding with a predefined storyline, carrying emotions and thoughts.
1 code implementation • 22 May 2024 • Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, ShuJian Huang
For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters.