Search Results for author: Zhenghao Liu

Found 57 papers, 50 papers with code

RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

1 code implementation25 Feb 2025 Mingyan Wu, Zhenghao Liu, Yukun Yan, Xinze Li, Shi Yu, Zheni Zeng, Yu Gu, Ge Yu

Retrieval-Augmented Generation (RAG) enhances the performance of Large Language Models (LLMs) by incorporating external knowledge.

RAG Retrieval

Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts

1 code implementation24 Feb 2025 Zhenghao Liu, Xingsheng Zhu, Tianshuo Zhou, Xinyi Zhang, Xiaoyuan Yi, Yukun Yan, Yu Gu, Ge Yu, Maosong Sun

This paper introduces Multi-Modal Retrieval-Augmented Generation (M^2RAG), a benchmark designed to evaluate the effectiveness of Multi-modal Large Language Models (MLLMs) in leveraging knowledge from multi-modal retrieval documents.

Benchmarking Fact Verification +4

LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences

1 code implementation24 Feb 2025 Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu

Query expansion plays a crucial role in information retrieval, which aims to bridge the semantic gap between queries and documents to improve matching performance.

Hallucination Information Retrieval +1

PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning

1 code implementation21 Feb 2025 Pengcheng Huang, Zhenghao Liu, Yukun Yan, Xiaoyuan Yi, Hao Chen, Zhiyuan Liu, Maosong Sun, Tong Xiao, Ge Yu, Chenyan Xiong

Knowledge-Augmented Generation (KAG) has shown great promise in updating the internal memory of Large Language Models (LLMs) by integrating external knowledge.

Hallucination

Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search

1 code implementation18 Feb 2025 Yifan Ji, Zhipeng Xu, Zhenghao Liu, Yukun Yan, Shi Yu, Yishan Li, Zhiyuan Liu, Yu Gu, Ge Yu, Maosong Sun

Recent dense retrievers usually thrive on the emergency capabilities of Large Language Models (LLMs), using them to encode queries and documents into an embedding space for retrieval.

Retrieval

PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths

1 code implementation18 Feb 2025 BoYu Chen, Zirui Guo, Zidan Yang, Yuluo Chen, Junze Chen, Zhenghao Liu, Chuan Shi, Cheng Yang

Typical RAG approaches split the text database into chunks, organizing them in a flat structure for efficient searches.

RAG Retrieval

KBAlign: Efficient Self Adaptation on Specific Knowledge Bases

1 code implementation22 Nov 2024 Zheni Zeng, Yuxuan Chen, Shi Yu, Ruobing Wang, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun

Humans can utilize techniques to quickly acquire knowledge from specific materials in advance, such as creating self-assessment questions, enabling us to achieving related tasks more efficiently.

Self-Learning

Building A Coding Assistant via the Retrieval-Augmented Language Model

1 code implementation21 Oct 2024 Xinze Li, Hanbin Wang, Zhenghao Liu, Shi Yu, Shuo Wang, Yukun Yan, Yukai Fu, Yu Gu, Ge Yu

Specifically, it consists of a code structure aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G).

Code Completion Code Generation +4

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

1 code implementation17 Oct 2024 Xinze Li, Sen Mei, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Hao Chen, Ge Yu, Zhiyuan Liu, Maosong Sun, Chenyan Xiong

Our experiments on various knowledge-intensive tasks demonstrate that DDR significantly outperforms the SFT method, particularly for LLMs with smaller-scale parameters that depend more on the retrieved knowledge.

RAG Retrieval

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

1 code implementation14 Oct 2024 Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun

In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.

RAG Retrieval

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

1 code implementation11 Oct 2024 Ruobing Wang, Daren Zha, Shi Yu, Qingfei Zhao, Yuxuan Chen, YiXuan Wang, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun

Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge.

Open-Domain Question Answering RAG +1

COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis

1 code implementation9 Aug 2024 Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu

In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process.

Code Generation Code Repair

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

1 code implementation2 Aug 2024 Kunlun Zhu, Yifan Luo, Dingling Xu, Ruobing Wang, Shi Yu, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun

Retrieval-Augmented Generation (RAG) is a powerful approach that enables large language models (LLMs) to incorporate external knowledge.

Benchmarking Dataset Generation +5

PersLLM: A Personified Training Approach for Large Language Models

1 code implementation17 Jul 2024 Zheni Zeng, Jiayi Chen, Huimin Chen, Yukun Yan, Yuxuan Chen, Zhenghao Liu, Zhiyuan Liu, Maosong Sun

Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems.

Prompt Engineering

Node-Time Conditional Prompt Learning In Dynamic Graphs

no code implementations22 May 2024 Xingtong Yu, Zhenghao Liu, Xinming Zhang, Yuan Fang

To bridge the gap, prompt-based learning has gained traction on graphs, but most existing efforts focus on static graphs, neglecting the evolution of dynamic graphs.

Link Prediction Node Classification

Multi-Evidence based Fact Verification via A Confidential Graph Neural Network

1 code implementation17 May 2024 Yuqing Lan, Zhenghao Liu, Yu Gu, Xiaoyuan Yi, Xiaohua LI, Liner Yang, Ge Yu

Nevertheless, the noisy nodes usually propagate their semantics via the edges of the reasoning graph, which misleads the semantic representations of other nodes and amplifies the noise signals.

Fact Verification Graph Attention +1

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

1 code implementation25 Feb 2024 Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu

It finetunes the compression plugin module and uses the representations of gist tokens to emulate the raw prompts in the vanilla language model.

Decoder Language Modeling +1

Cleaner Pretraining Corpus Curation with Neural Web Scraping

1 code implementation22 Feb 2024 Zhipeng Xu, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Ge Yu, Chenyan Xiong

The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans.

Language Modeling Language Modelling

OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models

1 code implementation21 Feb 2024 Meng Xu, Shuo Wang, Liner Yang, Haoyu Wang, Zhenghao Liu, Cunliang Kong, Yun Chen, Yang Liu, Maosong Sun, Erhong Yang

We evaluate several representative multilingual LLMs on the proposed OMGEval, which we believe will provide a valuable reference for the community to further understand and improve the multilingual capability of LLMs.

General Knowledge Logical Reasoning

ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents

1 code implementation21 Feb 2024 Zhipeng Xu, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Chaojun Xiao, Zhiyuan Liu, Ge Yu, Chenyan Xiong

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to leverage external knowledge, enhancing their performance on knowledge-intensive tasks.

Active Learning Position +3

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

1 code implementation18 Feb 2024 Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns.

Code Generation Data Visualization

UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

1 code implementation7 Feb 2024 Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun

Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs.

Cross-Lingual Transfer Data Augmentation

LegalDuet: Learning Effective Representations for Legal Judgment Prediction through a Dual-View Legal Clue Reasoning

1 code implementation27 Jan 2024 Pengjie Liu, Zhenghao Liu, Xiaoyuan Yi, Liner Yang, Shuo Wang, Yu Gu, Ge Yu, Xing Xie, Shuang-Hua Yang

It proposes a dual-view legal clue reasoning mechanism, which derives from two reasoning chains of judges: 1) Law Case Reasoning, which makes legal judgments according to the judgment experiences learned from analogy/confusing legal cases; 2) Legal Ground Reasoning, which lies in matching the legal clues between criminal cases and legal decisions.

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

1 code implementation16 Nov 2023 Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu

INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher.

Code Repair Code Translation

Modeling User Viewing Flow Using Large Language Models for Article Recommendation

no code implementations12 Nov 2023 Zhenghao Liu, Zulong Chen, Moufeng Zhang, Shaoyang Duan, Hong Wen, Liangyue Li, Nan Li, Yu Gu, Ge Yu

This paper proposes the User Viewing Flow Modeling (SINGLE) method for the article recommendation task, which models the user constant preference and instant interest from user-clicked articles.

Enhancing Dense Retrievers' Robustness with Group-level Reweighting

2 code implementations25 Oct 2023 Peixuan Han, Zhenghao Liu, Zhiyuan Liu, Chenyan Xiong

In this paper, we introduce WebDRO, an efficient approach for clustering the web graph data and optimizing group weights to enhance the robustness of dense retrieval models.

Clustering Link Prediction +2

MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

1 code implementation21 Oct 2023 Tianshuo Zhou, Sen Mei, Xinze Li, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Ge Yu

To facilitate the multi-modal retrieval tasks, we build the ClueWeb22-MM dataset based on the ClueWeb22 dataset, which regards anchor texts as queries, and extracts the related text and image documents from anchor-linked web pages.

Language Modelling Text Retrieval

Toolink: Linking Toolkit Creation and Using through Chain-of-Solving on Open-Source Model

1 code implementation8 Oct 2023 Cheng Qian, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu

We first validate the efficacy of Toolink in harnessing the model's creativity and CoS ability on ChatGPT.

valid

Text Matching Improves Sequential Recommendation by Reducing Popularity Biases

1 code implementation27 Aug 2023 Zhenghao Liu, Sen Mei, Chenyan Xiong, Xiaohua LI, Shi Yu, Zhiyuan Liu, Yu Gu, Ge Yu

TASTE alleviates the cold start problem by representing long-tail items using full-text modeling and bringing the benefits of pretrained language models to recommendation systems.

Sequential Recommendation Text Matching

MCTS: A Multi-Reference Chinese Text Simplification Dataset

1 code implementation5 Jun 2023 Ruining Chong, Luming Lu, Liner Yang, Jinran Nie, Zhenghao Liu, Shuo Wang, Shuhan Zhou, Yaoxin Li, Erhong Yang

We hope to build a basic understanding of Chinese text simplification through the foundational work and provide references for future research.

Machine Translation Text Simplification

Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data

2 code implementations31 May 2023 Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yu Gu, Zhiyuan Liu, Ge Yu

SANTA proposes two pretraining methods to make language models structure-aware and learn effective representations for structured data: 1) Structured Data Alignment, which utilizes the natural alignment relations between structured data and unstructured data for structure-aware pretraining.

Code Search Language Modeling +2

Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval

1 code implementation24 May 2023 Shi Yu, Chenghao Fan, Chenyan Xiong, David Jin, Zhiyuan Liu, Zhenghao Liu

Common document ranking pipelines in search systems are cascade systems that involve multiple ranking layers to integrate different information step-by-step.

Document Ranking Information Retrieval +3

CHGNN: A Semi-Supervised Contrastive Hypergraph Learning Network

no code implementations10 Mar 2023 Yumeng Song, Yu Gu, Tianyi Li, Jianzhong Qi, Zhenghao Liu, Christian S. Jensen, Ge Yu

However, recent studies on hypergraph learning that extend graph convolutional networks to hypergraphs cannot learn effectively from features of unlabeled data.

Contrastive Learning Node Classification

Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval

1 code implementation1 Sep 2022 Zhenghao Liu, Chenyan Xiong, Yuanhuiyi Lv, Zhiyuan Liu, Ge Yu

To learn a unified embedding space for multi-modal retrieval, UniVL-DR proposes two techniques: 1) Universal embedding optimization strategy, which contrastively optimizes the embedding space using the modality-balanced hard negatives; 2) Image verbalization method, which bridges the modality gap between images and texts in the raw data space.

Image Retrieval Open-Domain Question Answering +1

Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder

1 code implementation6 May 2022 Zhenghao Liu, Han Zhang, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Xiaohua LI

These embeddings need to be high-dimensional to fit training signals and guarantee the retrieval effectiveness of dense retrievers.

Dimensionality Reduction Information Retrieval +1

P^3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

1 code implementation4 May 2022 Xiaomeng Hu, Shi Yu, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu, Ge Yu

In this paper, we identify and study the two mismatches between pre-training and ranking fine-tuning: the training schema gap regarding the differences in training objectives and model architectures, and the task knowledge gap considering the discrepancy between the knowledge needed in ranking and that learned during pre-training.

YACLC: A Chinese Learner Corpus with Multidimensional Annotation

1 code implementation30 Dec 2021 Yingying Wang, Cunliang Kong, Liner Yang, Yijun Wang, Xiaorong Lu, Renfen Hu, Shan He, Zhenghao Liu, Yun Chen, Erhong Yang, Maosong Sun

This resource is of great relevance for second language acquisition research, foreign-language teaching, and automatic grammatical error correction.

Grammatical Error Correction Language Acquisition +1

More Robust Dense Retrieval with Contrastive Dual Learning

1 code implementation16 Jul 2021 Yizhi Li, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu

With contrastive learning, the dual training object of DANCE learns more tailored representations for queries and documents to keep the embedding space smooth and uniform, thriving on the ranking performance of DANCE on the MS MARCO document retrieval task.

Contrastive Learning Information Retrieval +1

Few-Shot Conversational Dense Retrieval

1 code implementation10 May 2021 Shi Yu, Zhenghao Liu, Chenyan Xiong, Tao Feng, Zhiyuan Liu

In this paper, we present a Conversational Dense Retrieval system, ConvDR, that learns contextualized embeddings for multi-turn conversational queries and retrieves documents solely using embedding dot products.

Conversational Search Retrieval

OpenMatch: An Open Source Library for Neu-IR Research

1 code implementation30 Jan 2021 Zhenghao Liu, Kaitao Zhang, Chenyan Xiong, Zhiyuan Liu, Maosong Sun

OpenMatch is a Python-based library that serves for Neural Information Retrieval (Neu-IR) research.

Document Ranking Information Retrieval +1

Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

1 code implementation ACL 2021 Si Sun, Yingzhuo Qian, Zhenghao Liu, Chenyan Xiong, Kaitao Zhang, Jie Bao, Zhiyuan Liu, Paul Bennett

To democratize the benefits of Neu-IR, this paper presents MetaAdaptRank, a domain adaptive learning method that generalizes Neu-IR models from label-rich source domains to few-shot target domains.

Information Retrieval Learning-To-Rank +1

Capturing Global Informativeness in Open Domain Keyphrase Extraction

2 code implementations28 Apr 2020 Si Sun, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, Jie Bao

Open-domain KeyPhrase Extraction (KPE) aims to extract keyphrases from documents without domain or quality restrictions, e. g., web pages with variant domains and qualities.

Chunking Informativeness +1

Selective Weak Supervision for Neural Information Retrieval

1 code implementation28 Jan 2020 Kaitao Zhang, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu

This paper democratizes neural information retrieval to scenarios where large scale relevance training signals are not available.

Information Retrieval Learning-To-Rank +1

Fine-grained Fact Verification with Kernel Graph Attention Network

1 code implementation ACL 2020 Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu

Fact Verification requires fine-grained natural language inference capability that finds subtle clues to identify the syntactical and semantically correct but not well-supported claims.

Fact Verification Graph Attention +1

Explore Entity Embedding Effectiveness in Entity Retrieval

no code implementations28 Aug 2019 Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu

Entity embedding learns lots of semantic information from the knowledge graph and represents entities with a low-dimensional representation, which provides an opportunity to establish interactions between query related entities and candidate entities for entity retrieval.

Entity Retrieval Learning-To-Rank +1

Cannot find the paper you are looking for? You can Submit a new open access paper.