no code implementations • EMNLP 2020 • Dongfang Li, Baotian Hu, Qingcai Chen, Weihua Peng, Anqi Wang
Machine reading comprehension (MRC) has achieved significant progress on the open domain in recent years, mainly due to large-scale pre-trained language models.
no code implementations • LT4HALA (LREC) 2022 • Wei Xinyuan, Liu Weihao, Qing Zong, Zhang Shaoqing, Baotian Hu
We participate in the LT4HALA2022 shared task EvaHan.
1 code implementation • 8 May 2025 • Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang
Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integrating modalities such as text, images, audio, and video to support complex reasoning capabilities and aiming to achieve comprehensive perception, precise understanding, and deep reasoning.
no code implementations • 23 Apr 2025 • Xinyu Chen, Yunxin Li, Haoyuan Shi, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang
Assessing the video comprehension capabilities of multimodal AI systems can effectively measure their understanding and reasoning abilities.
1 code implementation • 9 Apr 2025 • Jifang Wang, Xue Yang, Longyue Wang, Zhenran Xu, Yiyu Wang, YaoWei Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks.
1 code implementation • 13 Mar 2025 • Zhenyu Liu, Dongfang Li, Xinshuo Hu, Xinping Zhao, Yibin Chen, Baotian Hu, Min Zhang
We find that the transformer embeds the task function learned from demonstrations into the separator token representation, which plays an important role in the generation of prior response tokens.
1 code implementation • 27 Feb 2025 • Zhenyu Liu, Yunxin Li, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang
Specifically, our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.
no code implementations • 21 Feb 2025 • Zetian Sun, Dongfang Li, Baotian Hu, Jun Yu, Min Zhang
In the Large Language Model(LLM) reasoning scenario, people often estimate state value via Monte Carlo sampling.
1 code implementation • 22 Jan 2025 • Zhenran Xu, Longyue Wang, Jifang Wang, Zhouyi Li, Senbao Shi, Xue Yang, Yiyu Wang, Baotian Hu, Jun Yu, Min Zhang
Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions.
1 code implementation • 2 Jan 2025 • Xinshuo Hu, Zifei Shan, Xinping Zhao, Zetian Sun, Zhenyu Liu, Dongfang Li, Shaolin Ye, Xinyuan Wei, Qian Chen, Baotian Hu, Haofen Wang, Jun Yu, Min Zhang
As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial.
1 code implementation • 24 Dec 2024 • Xinping Zhao, Baotian Hu, Yan Zhong, Shouzheng Huang, Zihao Zheng, Meng Wang, Haofen Wang, Min Zhang
Although prevailing supervised and self-supervised learning (SSL)-augmented sequential recommendation (SeRec) models have achieved improved performance with powerful neural network architectures, we argue that they still suffer from two limitations: (1) Preference Drift, where models trained on past data can hardly accommodate evolving user preference; and (2) Implicit Memory, where head patterns dominate parametric learning, making it harder to recall long tails.
Ranked #2 on
Sequential Recommendation
on Amazon-Beauty
no code implementations • 10 Dec 2024 • Dongfang Li, Zetian Sun, Xinshuo Hu, Baotian Hu, Min Zhang
Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences.
no code implementations • 15 Oct 2024 • Xinping Zhao, Dongfang Li, Yan Zhong, Boren Hu, Yibin Chen, Baotian Hu, Min Zhang
Recent studies in Retrieval-Augmented Generation (RAG) have investigated extracting evidence from retrieved passages to reduce computational costs and enhance the final RAG performance, yet it remains challenging.
1 code implementation • 14 Oct 2024 • Xinping Zhao, Chaochao Chen, Jiajie Su, Yizhao Zhang, Baotian Hu
In this paper, we propose a model-agnostic framework, named AttrGAU (Attributed Graph Networks with Alignment and Uniformity Constraints), to bring the MIA's superiority into existing attribute-agnostic models, to improve their accuracy and robustness for recommendation.
no code implementations • 14 Oct 2024 • Xinping Zhao, Yan Zhong, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Dongfang Li, Baotian Hu, Min Zhang
In this work, we propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG, termed FunnelRAG, so as to balance effectiveness and efficiency.
no code implementations • 14 Oct 2024 • Xinping Zhao, Jindi Yu, Zhenyu Liu, Jifang Wang, Dongfang Li, Yibin Chen, Baotian Hu, Min Zhang
Therefore, it is necessary to resort to external knowledge to detect and correct the hallucinated content.
1 code implementation • 19 Aug 2024 • Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang
These images are designed to maintain visual consistency across different scenes using a visual-language prompting method that combines scene descriptions and images of the appearing character and setting.
no code implementations • 17 Jun 2024 • Yunxin Li, Xinyu Chen, Baotian Hu, Longyue Wang, Haoyuan Shi, Min Zhang
Through a comprehensive and quantitative evaluation of cutting-edge models, we reveal that: 1) Video-LMMs face difficulties in fine-grained video tasks involving temporal location, object tracking, and anomaly detection; 2) Video-LMMs present inferior logical and relation reasoning abilities; 3) Open-source Video-LMMs' performance is significantly lower than GPT-4o and Gemini-1. 5, lagging by 20 points.
1 code implementation • 18 May 2024 • Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang
Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities.
Ranked #178 on
Visual Question Answering
on MM-Vet
1 code implementation • 8 May 2024 • Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang
Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context.
1 code implementation • 17 Apr 2024 • Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang
In this paper, we address this gap by presenting a comprehensive analysis of these compressed vectors, drawing parallels to the parameters trained with gradient descent, and introduce the concept of state vector.
no code implementations • 27 Mar 2024 • Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang
Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content.
1 code implementation • 26 Feb 2024 • Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang
In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself.
no code implementations • 22 Feb 2024 • Xinshuo Hu, Baotian Hu, Dongfang Li, Xiaoguang Li, Lifeng Shang
The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context.
1 code implementation • 21 Feb 2024 • Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang
For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language model-based decoder to generate the product description.
1 code implementation • 21 Feb 2024 • Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang
Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e. g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i. e., connecting visuals to their relevant knowledge.
no code implementations • 29 Dec 2023 • Dongfang Li, Baotian Hu, Qingcai Chen, Shan He
Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI.
no code implementations • 27 Nov 2023 • Yunxin Li, Baotian Hu, Wei Wang, Xiaochun Cao, Min Zhang
These models predominantly map visual information into language representation space, leveraging the vast knowledge and powerful text generation abilities of LLMs to produce multimodal instruction-following responses.
1 code implementation • 15 Nov 2023 • Ziyang Chen, Dongfang Li, Xiang Zhao, Baotian Hu, Min Zhang
In this study, we address the challenge of enhancing temporal knowledge reasoning in Large Language Models (LLMs).
Ranked #3 on
Question Answering
on MultiTQ
1 code implementation • 14 Nov 2023 • Zhenran Xu, Senbao Shi, Baotian Hu, Jindi Yu, Dongfang Li, Min Zhang, Yuxiang Wu
Large Language Models (LLMs) have shown remarkable capabilities in general natural language processing tasks but often fall short in complex reasoning tasks.
1 code implementation • 13 Nov 2023 • Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Wei Wang, Min Zhang
The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA).
1 code implementation • 7 Nov 2023 • Dongfang Li, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Ziyang Chen, Baotian Hu, Aiguo Wu, Min Zhang
Open-domain generative systems have gained significant attention in the field of conversational AI (e. g., generative search engines).
1 code implementation • 19 Oct 2023 • Yulin Chen, Zhenran Xu, Baotian Hu, Min Zhang
Entity linking aims to link ambiguous mentions to their corresponding entities in a knowledge base.
1 code implementation • 19 Oct 2023 • Zhenran Xu, Yulin Chen, Baotian Hu, Min Zhang
Zero-shot entity linking (EL) aims at aligning entity mentions to unseen entities to challenge the generalization ability.
1 code implementation • 16 Aug 2023 • Xinshuo Hu, Dongfang Li, Baotian Hu, Zihao Zheng, Zhenyu Liu, Min Zhang
To evaluate the effectiveness of our approach in terms of truthfulness and detoxification, we conduct extensive experiments on LLMs, encompassing additional abilities such as language modeling and mathematical reasoning.
1 code implementation • 22 Jun 2023 • Senbao Shi, Zhenran Xu, Baotian Hu, Min Zhang
Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to the referent entities from a knowledge base.
1 code implementation • 22 May 2023 • Dongfang Li, Jindi Yu, Baotian Hu, Zhenran Xu, Min Zhang
As ChatGPT and GPT-4 spearhead the development of Large Language Models (LLMs), more researchers are investigating their performance across various tasks.
1 code implementation • 8 May 2023 • Yunxin Li, Baotian Hu, Xinyu Chen, Yuxin Ding, Lin Ma, Min Zhang
This makes the language model well-suitable for such multi-modal reasoning scenario on joint textual and visual clues.
1 code implementation • 5 May 2023 • Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang
LMEye addresses this issue by allowing the LLM to request the desired visual information aligned with various human instructions, which we term as the dynamic visual information interaction.
1 code implementation • 3 May 2023 • Yunxin Li, Baotian Hu, Yuxin Ding, Lin Ma, Min Zhang
Inspired by the Divide-and-Conquer algorithm and dual-process theory, in this paper, we regard linguistically complex texts as compound proposition texts composed of multiple simple proposition sentences and propose an end-to-end Neural Divide-and-Conquer Reasoning framework, dubbed NDCR.
1 code implementation • 16 Dec 2022 • Qian Yang, Qian Chen, Wen Wang, Baotian Hu, Min Zhang
Moreover, the pipelined approaches of retrieval and generation might result in poor generation performance when retrieval performance is low.
no code implementations • COLING 2022 • Dongfang Li, Baotian Hu, Qingcai Chen
To address these challenges, we propose Prompt-based Text Entailment (PTE) for low-resource named entity recognition, which better leverages knowledge in the PLMs.
Low Resource Named Entity Recognition
named-entity-recognition
+3
1 code implementation • 6 Nov 2022 • Dongfang Li, Baotian Hu, Qingcai Chen
We conduct extensive experiments on six datasets with two popular pre-trained language models in the in-domain and out-of-domain settings.
1 code implementation • 30 Oct 2022 • Yuxiang Wu, Yu Zhao, Baotian Hu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel
Experiments on various knowledge-intensive tasks such as question answering and dialogue datasets show that, simply augmenting parametric models (T5-base) using our method produces more accurate results (e. g., 25. 8 -> 44. 3 EM on NQ) while retaining a high throughput (e. g., 1000 queries/s on NQ).
Ranked #4 on
Question Answering
on KILT: ELI5
1 code implementation • 26 Jul 2022 • Zhenran Xu, Zifei Shan, Yuxin Li, Baotian Hu, Bing Qin
We then establish a strong baseline that scores a R@1 of 46. 2% on Few-Shot and 76. 6% on Zero-Shot on our dataset.
1 code implementation • 23 Jul 2022 • Qian Yang, Yunxin Li, Baotian Hu, Lin Ma, Yuxing Ding, Min Zhang
CSI), a relation inferrer, and a Lexical Constraint-aware Generator (arr.
no code implementations • 17 Jun 2022 • Yu Zhao, Yunxin Li, Yuxiang Wu, Baotian Hu, Qingcai Chen, Xiaolong Wang, Yuxin Ding, Min Zhang
To mitigate this problem, we propose a medical response generation model with Pivotal Information Recalling (MedPIR), which is built on two components, i. e., knowledge-aware dialogue graph encoder and recall-enhanced generator.
no code implementations • 17 Jun 2022 • Yu Zhao, Xinshuo Hu, Yunxin Li, Baotian Hu, Dongfang Li, Sichao Chen, Xiaolong Wang
In this paper, we propose a general Multi-Skill Dialog Framework, namely MSDF, which can be applied in different dialog tasks (e. g. knowledge grounded dialog and persona based dialog).
1 code implementation • 20 Dec 2021 • Dongfang Li, Baotian Hu, Qingcai Chen, Tujie Xu, Jingcong Tao, Yunan Zhang
Recent works have shown explainability and robustness are two crucial ingredients of trustworthy and reliable text classification.
no code implementations • 4 Jul 2021 • Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding
Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.
no code implementations • 1 Jul 2021 • Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma
Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.
1 code implementation • ACL 2021 • Shuoran Jiang, Qingcai Chen, Xin Liu, Baotian Hu, Lisai Zhang
In this study, we define the spectral graph convolutional network with the high-order dynamic Chebyshev approximation (HDGCN), which augments the multi-hop graph reasoning by fusing messages aggregated from direct and long-term dependencies into one convolutional layer.
no code implementations • 27 Mar 2021 • Dongfang Li, Jingcong Tao, Qingcai Chen, Baotian Hu
The experimental results show that the proposed approach can generate reasonable explanations for its predictions even with a small-scale training corpus.
no code implementations • 1 Jan 2021 • Zhaobin Xu, Baotian Hu, Buzhou Tang
It has two major parts.
no code implementations • COLING 2020 • Youcheng Pan, Qingcai Chen, Weihua Peng, Xiaolong Wang, Baotian Hu, Xin Liu, Junying Chen, Wenxiu Zhou
To exploit the domain knowledge to guarantee the correctness of generated text has been a hot topic in recent years, especially for high professional domains such as medical.
no code implementations • 16 Apr 2020 • Kai Chen, Fayuan Li, Baotian Hu, Weihua Peng, Qingcai Chen, Hong Yu
We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text.
1 code implementation • 7 Apr 2020 • Lisai Zhang, Qingcai Chen, Baotian Hu, Shuoran Jiang
To fulfill such a task, we propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet).
no code implementations • 7 Apr 2020 • Xin Liu, Qingcai Chen, Yan Liu, Joanna Siebert, Baotian Hu, Xiang-Ping Wu, Buzhou Tang
We propose a Capsule network-based method to Decompose the unsupervised word Embedding of an ambiguous word into context specific Sense embedding, called CapsDecE2S.
no code implementations • WS 2019 • Dongfang Li, Ying Xiong, Baotian Hu, Hanyang Du, Buzhou Tang, Qingcai Chen
In this paper, we present our approaches for trigger word detection (task 1) and the identification of its thematic role (task 2) in AGAC track of BioNLP Open Shared Task 2019.
no code implementations • NAACL 2018 • Tu Vu, Baotian Hu, Tsendsuren Munkhdalai, Hong Yu
Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications.
Ranked #1 on
Text Simplification
on PWKP / WikiSmall
no code implementations • 19 Apr 2018 • Yuxiang Wu, Baotian Hu
As an effort towards extracting coherent summaries, we propose a neural coherence model to capture the cross-sentence semantic and syntactic coherence patterns.
Ranked #3 on
Text Summarization
on CNN / Daily Mail (Anonymized)
no code implementations • 13 Oct 2016 • Baotian Hu, Xin Liu, Xiang-Ping Wu, Qingcai Chen
In this paper, we propose a novel model, named Stroke Sequence-dependent Deep Convolutional Neural Network (SSDCNN), using the stroke sequence information and eight-directional features for Online Handwritten Chinese Character Recognition (OLHCCR).
no code implementations • IJCNLP 2015 • Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, Xiaolong Wang
In this paper, the answer selection problem in community question answering (CQA) is regarded as an answer sequence labeling task, and a novel approach is proposed based on the recurrent architecture for this problem.
3 code implementations • EMNLP 2015 • Baotian Hu, Qingcai Chen, Fangze Zhu
Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set.
Ranked #1 on
Text Summarization
on LCSTS
2 code implementations • NeurIPS 2014 • Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen
Semantic matching is of central importance to many natural language tasks \cite{bordes2014semantic, RetrievalQA}.
Ranked #3 on
Question Answering
on SemEvalCQA
no code implementations • IJCNLP 2015 • Zhaopeng Tu, Baotian Hu, Zhengdong Lu, Hang Li
We propose a novel method for translation selection in statistical machine translation, in which a convolutional neural network is employed to judge the similarity between a phrase pair in two languages.