Search Results for author: Zhicheng Dou

Found 61 papers, 33 papers with code

Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder

1 code implementation • EMNLP 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.

Language Modelling News Recommendation +4

Paper
Code

基于双星型自注意力网络的搜索结果多样化方法(Search Result Diversification Framework Based on Dual Star-shaped Self-Attention Network)

no code implementations • CCL 2021 • Xubo Qin, Zhicheng Dou, Yutao Zhu, JiRong Wen

“相关研究指出, 用户提交给搜索引擎的查询通常为短查询。由于自然语言本身的特点, 短查询通常具有歧义性, 同一个查询可以指代不同的事物, 或同一事物的不同方面。为了让搜索结果尽可能满足用户多样化的信息需求, 搜索引擎需要对返回的结果进行多样化排序, 搜索结果多样化技术应运而生。目前已有的基于全局交互的多样化方法通过全连接的自注意力网络捕获全体候选文档间的交互关系, 取得了较好的效果。但由于此类方法只考虑文档间的相关关系, 并没有考虑到文档是否具有跟查询相关的有效信息, 在训练数据有限的条件下效率相对较低。该文提出了一种基于双星型自注意力网络的搜索结果多样化方法, 将全连接结构改为星型拓扑结构, 并嵌入查询信息以高效率地提取文档跟查询相关的全局交互特征。相关实验结果显示, 该模型相对于基于全连接自注意力网络的多样化方法, 具备显著的性能优势。”

Paper
Add Code

An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models

no code implementations • 20 Mar 2024 • Qi Liu, Gang Guo, Jiaxin Mao, Zhicheng Dou, Ji-Rong Wen, Hao Jiang, Xinyu Zhang, Zhao Cao

Based on these findings, we then propose several simple document pruning methods to reduce the storage overhead and compare the effectiveness of different pruning methods on different late-interaction models.

Retrieval

Paper
Add Code

UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

1 code implementation • 22 Feb 2024 • Zhaoheng Huang, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen

To address these challenges, we categorize four available fact sources: human-written evidence, reference documents, search engine results, and LLM knowledge, along with five text generation tasks containing six representative datasets.

Hallucination Retrieval +1

Paper
Code

Interpreting Conversational Dense Retrieval by Rewriting-Enhanced Inversion of Session Embedding

no code implementations • 20 Feb 2024 • Yiruo Cheng, Kelong Mao, Zhicheng Dou

Such transformation is achieved by training a recently proposed Vec2Text model based on the ad-hoc query encoder, leveraging the fact that the session and query embeddings share the same space in existing conversational dense retrieval.

Conversational Search Retrieval

Paper
Add Code

Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs

1 code implementation • 19 Feb 2024 • Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, Ji-Rong Wen

The integration of large language models (LLMs) and search engines represents a significant evolution in knowledge acquisition methodologies.

Question Answering

Paper
Code

BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence

no code implementations • 19 Feb 2024 • Jiajie Jin, Yutao Zhu, Yujia Zhou, Zhicheng Dou

Retrieval-augmented large language models (LLMs) have demonstrated efficacy in knowledge-intensive tasks such as open-domain QA, addressing inherent challenges in knowledge update and factual inadequacy.

Question Answering Retrieval

Paper
Add Code

Metacognitive Retrieval-Augmented Large Language Models

1 code implementation • 18 Feb 2024 • Yujia Zhou, Zheng Liu, Jiajie Jin, Jian-Yun Nie, Zhicheng Dou

Drawing from cognitive psychology, metacognition allows an entity to self-reflect and critically evaluate its cognitive processes.

Response Generation Retrieval

Paper
Code

Cognitive Personalized Search Integrating Large Language Models with an Efficient Memory Mechanism

no code implementations • 16 Feb 2024 • Yujia Zhou, Qiannan Zhu, Jiajie Jin, Zhicheng Dou

To counter this limitation, personalized search has been developed to re-rank results based on user preferences derived from query logs.

Paper
Add Code

Grounding Language Model with Chunking-Free In-Context Retrieval

no code implementations • 15 Feb 2024 • Hongjin Qian, Zheng Liu, Kelong Mao, Yujia Zhou, Zhicheng Dou

These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained.

Chunking Language Modelling +2

Paper
Add Code

Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation

no code implementations • 11 Feb 2024 • Haonan Chen, Zhicheng Dou, Kelong Mao, Jiongnan Liu, Ziliang Zhao

Conversational search utilizes muli-turn natural language contexts to retrieve relevant passages.

Contrastive Learning Conversational Search +2

Paper
Add Code

Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training

no code implementations • 11 Feb 2024 • Haonan Chen, Zhicheng Dou, Xuetong Hao, Yunhao Tao, Shiren Song, Zhenli Sheng

Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems.

Data Augmentation

Paper
Add Code

Towards a Unified Language Model for Knowledge-Intensive Tasks Utilizing External Corpus

no code implementations • 2 Feb 2024 • Xiaoxi Li, Zhicheng Dou, Yujia Zhou, Fangchao Liu

Through generative retrieval (GR) approach, language models can achieve superior retrieval performance by directly generating relevant document identifiers (DocIDs).

Language Modelling Retrieval

Paper
Add Code

Session-level Normalization and Click-through Data Enhancement for Session-based Evaluation

no code implementations • 23 Jan 2024 • Haonan Chen, Zhicheng Dou, Jiaxin Mao

Besides, it infers session-level relevance labels based on implicit feedback.

Session Search

Paper
Add Code

Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis

no code implementations • 16 Jan 2024 • Zhicheng Dou, Yuchen Guo, Ching-Chun Chang, Huy H. Nguyen, Isao Echizen

In this paper, we present a comprehensive analysis of the impact of prompts on the text generated by LLMs and highlight the potential lack of robustness in one of the current state-of-the-art GPT detectors.

Paper
Add Code

INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning

1 code implementation • 12 Jan 2024 • Yutao Zhu, Peitian Zhang, Chenghao Zhang, Yifei Chen, Binyu Xie, Zhicheng Dou, Zheng Liu, Ji-Rong Wen

Despite this, their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.

document understanding Information Retrieval +2

182

Paper
Code

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

1 code implementation • 7 Jan 2024 • Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou

Although the context window can be extended through fine-tuning, it will result in a considerable cost at both training and inference time, and exert an unfavorable impact to the LLM's original capabilities.

4k Language Modelling

4,739

Paper
Code

UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models

no code implementations • 18 Dec 2023 • Xiaoxi Li, Yujia Zhou, Zhicheng Dou

Generative information retrieval, encompassing two major tasks of Generative Document Retrieval (GDR) and Grounded Answer Generation (GAR), has gained significant attention in the area of information retrieval and natural language processing.

Answer Generation Information Retrieval +2

Paper
Add Code

Retrieve Anything To Augment Large Language Models

1 code implementation • 11 Oct 2023 • Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, Jian-Yun Nie

On the other hand, the task-specific retrievers lack the required versatility, hindering their performance across the diverse retrieval augmentation scenarios.

Knowledge Distillation Retrieval

4,739

Paper
Code

Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection

no code implementations • 30 Aug 2023 • Hongjin Qian, Zhicheng Dou, Jiejun Tan, Haonan Chen, Haoqi Gu, Ruofei Lai, Xinyu Zhang, Zhao Cao, Ji-Rong Wen

Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up(e. g., entity mismatch) of irrelevant references.

Text Generation

Paper
Add Code

Large Language Models for Information Retrieval: A Survey

1 code implementation • 14 Aug 2023 • Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Haonan Chen, Zhicheng Dou, Ji-Rong Wen

This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity).

Information Retrieval Question Answering +2

301

Paper
Code

Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community

no code implementations • 19 Jul 2023 • Qingyao Ai, Ting Bai, Zhao Cao, Yi Chang, Jiawei Chen, Zhumin Chen, Zhiyong Cheng, Shoubin Dong, Zhicheng Dou, Fuli Feng, Shen Gao, Jiafeng Guo, Xiangnan He, Yanyan Lan, Chenliang Li, Yiqun Liu, Ziyu Lyu, Weizhi Ma, Jun Ma, Zhaochun Ren, Pengjie Ren, Zhiqiang Wang, Mingwen Wang, Ji-Rong Wen, Le Wu, Xin Xin, Jun Xu, Dawei Yin, Peng Zhang, Fan Zhang, Weinan Zhang, Min Zhang, Xiaofei Zhu

The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs.

Information Retrieval Retrieval

Paper
Add Code

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

1 code implementation • 8 Jun 2023 • Jiongnan Liu, Jiajie Jin, Zihan Wang, Jiehan Cheng, Zhicheng Dou, Ji-Rong Wen

To support research in this area and facilitate the development of retrieval-augmented LLM systems, we develop RETA-LLM, a {RET}reival-{A}ugmented LLM toolkit.

Answer Generation Fact Checking +5

200

Paper
Code

User Behavior Simulation with Large Language Model based Agents

1 code implementation • 5 Jun 2023 • Lei Wang, Jingsen Zhang, Hao Yang, ZhiYuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen

Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process.

Language Modelling Large Language Model +2

206

Paper
Code

JDsearch: A Personalized Product Search Dataset with Real Queries and Full Interactions

1 code implementation • 24 May 2023 • Jiongnan Liu, Zhicheng Dou, Guoyu Tang, Sulong Xu

To evaluate the effectiveness of these models, previous studies mainly utilize the simulated Amazon recommendation dataset, which contains automatically generated queries and excludes cold users and tail products.

Paper
Code

Generative Retrieval via Term Set Generation

1 code implementation • 23 May 2023 • Peitian Zhang, Zheng Liu, Yujia Zhou, Zhicheng Dou, Fangchao Liu, Zhao Cao

On top of the term-set DocID, we propose a permutation-invariant decoding algorithm, with which the term set can be generated in any permutation yet will always lead to the corresponding document.

Information Retrieval Natural Questions +1

Paper
Code

WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus

1 code implementation • 10 Apr 2023 • Hongjing Qian, Yutao Zhu, Zhicheng Dou, Haoqi Gu, Xinyu Zhang, Zheng Liu, Ruofei Lai, Zhao Cao, Jian-Yun Nie, Ji-Rong Wen

In this paper, we introduce a new NLP task -- generating short factual articles with references for queries by mining supporting evidence from the Web.

Retrieval Text Generation

Paper
Code

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

2 code implementations • 12 Mar 2023 • Kelong Mao, Zhicheng Dou, Fengran Mo, Jiewen Hou, Haonan Chen, Hongjin Qian

Precisely understanding users' contextual search intent has been an important challenge for conversational search.

Conversational Search Text Generation

Paper
Code

CDSM: Cascaded Deep Semantic Matching on Textual Graphs Leveraging Ad-hoc Neighbor Selection

1 code implementation • 30 Nov 2022 • Jing Yao, Zheng Liu, Junhan Yang, Zhicheng Dou, Xing Xie, Ji-Rong Wen

In the first stage, a lightweight CNN-based ad-hod neighbor selector is deployed to filter useful neighbors for the matching task with a small computation cost.

Paper
Code

MCP: Self-supervised Pre-training for Personalized Chatbots with Multi-level Contrastive Sampling

no code implementations • 17 Oct 2022 • Zhaoheng Huang, Zhicheng Dou, Yutao Zhu, Zhengyi Ma

To tackle these problems, we propose a self-supervised learning framework MCP for capturing better representations from users' dialogue history for personalized chatbots.

Response Generation Self-Supervised Learning

Paper
Add Code

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

1 code implementation • 11 Oct 2022 • Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, Jing Yao

Based on comprehensive experiments on popular retrieval benchmarks, we verify that clusters and terms indeed complement each other, enabling HI$^2$ to achieve lossless retrieval quality with competitive efficiency across various index settings.

Knowledge Distillation Quantization +1

Paper
Code

Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?

no code implementations • 14 Sep 2022 • Jiawen Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu, Zikai Guo, Zhaoye Fei, Ruofei Lai, Yongkang Wu, Zhao Cao, Zhicheng Dou

Hyperlinks, which are commonly used in Web pages, have been leveraged for designing pre-training objectives.

Information Retrieval Question Answering +1

Paper
Add Code

Enhancing User Behavior Sequence Modeling by Generative Tasks for Session Search

1 code implementation • 23 Aug 2022 • Haonan Chen, Zhicheng Dou, Yutao Zhu, Zhao Cao, Xiaohua Cheng, Ji-Rong Wen

To help the encoding of the current user behavior sequence, we propose to use a decoder and the information of future sequences and a supplemental query.

Session Search

Paper
Code

From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking

1 code implementation • 22 Aug 2022 • Yutao Zhu, Jian-Yun Nie, Yixuan Su, Haonan Chen, Xinyu Zhang, Zhicheng Dou

In this work, we propose a curriculum learning framework for context-aware document ranking, in which the ranking model learns matching signals between the search context and the candidate document in an easy-to-hard manner.

Document Ranking

Paper
Code

Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer

no code implementations • 19 Aug 2022 • Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Peitian Zhang, Ji-Rong Wen

In order to unify these two stages, we explore a model-based indexer for document retrieval.

Retrieval

Paper
Add Code

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding

no code implementations • COLING 2022 • Zhaoye Fei, Yu Tian, Yongkang Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu, Jiawen Wu, Dejiang Kong, Ruofei Lai, Zhao Cao, Zhicheng Dou, Xipeng Qiu

Our experiments on 13 benchmark datasets across five natural language understanding tasks demonstrate the superiority of our method.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation

no code implementations • NAACL 2022 • Hanxun Zhong, Zhicheng Dou, Yutao Zhu, Hongjin Qian, Ji-Rong Wen

Existing personalized dialogue systems have tried to extract user profiles from dialogue history to guide personalized response generation.

Dialogue Generation Response Generation

Paper
Add Code

DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

no code implementations • 1 Mar 2022 • Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Ji-Rong Wen

Web search provides a promising way for people to obtain information and has been extensively studied.

Information Retrieval Retrieval

Paper
Add Code

KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models

no code implementations • 28 Feb 2022 • Daniel Gao, Yantao Jia, Lei LI, Chengzhen Fu, Zhicheng Dou, Hao Jiang, Xinyu Zhang, Lei Chen, Zhao Cao

However, to figure out whether PLMs can be reliable knowledge sources and used as alternative knowledge bases (KBs), we need to further explore some critical features of PLMs.

General Knowledge Memorization +1

Paper
Add Code

Socialformer: Social Network Inspired Long Document Modeling for Document Ranking

1 code implementation • 22 Feb 2022 • Yujia Zhou, Zhicheng Dou, Huaying Yuan, Zhengyi Ma

In this paper, we propose the model Socialformer, which introduces the characteristics of social networks into designing sparse attention patterns for long document modeling in document ranking.

Document Ranking

Paper
Code

PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling

1 code implementation • 24 Nov 2021 • Yujia Zhou, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen

Personalized search plays a crucial role in improving user search experience owing to its ability to build user profiles based on historical behaviors.

Self-Supervised Learning Sentence

Paper
Code

Group based Personalized Search by Integrating Search Behaviour and Friend Network

1 code implementation • 24 Nov 2021 • Yujia Zhou, Zhicheng Dou, Bingzheng Wei, Ruobing Xievand Ji-Rong Wen

Specifically, we propose a friend network enhanced personalized search model, which groups the user into multiple friend circles based on search behaviours and friend relations respectively.

Re-Ranking

Paper
Code

Towards More Effective and Economic Sparsely-Activated Model

no code implementations • 14 Oct 2021 • Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao

To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU).

Paper
Add Code

Learning to Select Historical News Articles for Interaction based Neural News Recommendation

no code implementations • 13 Oct 2021 • Peitian Zhang, Zhicheng Dou, Jing Yao

The key to personalized news recommendation is to match the user's interests with the candidate news precisely and efficiently.

News Recommendation

Paper
Add Code

USER: A Unified Information Search and Recommendation Model based on Integrated Behavior Sequence

no code implementations • 30 Sep 2021 • Jing Yao, Zhicheng Dou, Ruobing Xie, Yanxiong Lu, Zhiping Wang, Ji-Rong Wen

Search and recommendation are the two most common approaches used by people to obtain information.

Paper
Add Code

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

no code implementations • 14 Sep 2021 • Ruizhi Pu, Xinyu Zhang, Ruofei Lai, Zikai Guo, Yinxia Zhang, Hao Jiang, Yongkang Wu, Yantao Jia, Zhicheng Dou, Zhao Cao

Finally, supervisory signal in rear compressor is computed based on condition probability and thus can control sample dynamic and further enhance the model performance.

Document Ranking Information Retrieval +1

Paper
Add Code

Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking

1 code implementation • 24 Aug 2021 • Yutao Zhu, Jian-Yun Nie, Zhicheng Dou, Zhengyi Ma, Xinyu Zhang, Pan Du, Xiaochen Zuo, Hao Jiang

To learn a more robust representation of the user behavior sequence, we propose a method based on contrastive learning, which takes into account the possible variations in user's behavior sequences.

Contrastive Learning Data Augmentation +1

Paper
Code

Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need

1 code implementation • 20 Aug 2021 • Zhengyi Ma, Zhicheng Dou, Wei Xu, Xinyu Zhang, Hao Jiang, Zhao Cao, Ji-Rong Wen

In this paper, we propose to leverage the large-scale hyperlinks and anchor texts to pre-train the language model for ad-hoc retrieval.

Language Modelling Retrieval

Paper
Code

One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles

1 code implementation • 20 Aug 2021 • Zhengyi Ma, Zhicheng Dou, Yutao Zhu, Hanxun Zhong, Ji-Rong Wen

Specifically, leveraging the benefits of Transformer on language understanding, we train a personalized language model to construct a general user profile from the user's historical responses.

Chatbot Language Modelling

Paper
Code

Learning Implicit User Profiles for Personalized Retrieval-Based Chatbot

1 code implementation • 18 Aug 2021 • Hongjin Qian, Zhicheng Dou, Yutao Zhu, Yueyuan Ma, Ji-Rong Wen

To learn a user's personalized language style, we elaborately build language models from shallow to deep using the user's historical responses; To model a user's personalized preferences, we explore the conditional relations underneath each post-response pair of the user.

Chatbot Retrieval

Paper
Code

Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals

1 code implementation • 18 Jul 2021 • Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, Hao Jiang, Zhicheng Dou

The final response is selected according to the predicted knowledge, the goal to achieve, and the context.

Multi-Task Learning Retrieval

Paper
Code

Answer Complex Questions: Path Ranker Is All You Need

3 code implementations • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 • Xinyu Zhang, Ke Zhan, Enrui Hu, Chengzhen Fu, Lan Luo, Hao Jiang, Yantao Jia, Fan Yu, Zhicheng Dou, Zhao Cao, Lei Chen

Currently, the most popular method for open-domain Question Answering (QA) adopts "Retriever and Reader" pipeline, where the retriever extracts a list of candidate documents from a large set of documents followed by a ranker to rank the most relevant documents and the reader extracts answer from the candidates.

Open-Domain Question Answering

334

Paper
Code

Emotion Eliciting Machine: Emotion Eliciting Conversation Generation based on Dual Generator

no code implementations • 18 May 2021 • Hao Jiang, Yutao Zhu, Xinyu Zhang, Zhicheng Dou, Pan Du, Te Pi, Yantao Jia

Then we propose a dual encoder-decoder structure to model the generation of responses in both positive and negative side based on the changes of the user's emotion status in the conversation.

Paper
Add Code

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

2 code implementations • 11 Mar 2021 • Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, ShiZhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen

We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for pre-training our BriVL model.

Ranked #1 on Image Retrieval on RUC-CAS-WenLan

Contrastive Learning Image Captioning +2

273

Paper
Code

Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

1 code implementation • 18 Feb 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, TieYan Liu, Arnold Overwijk

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.

Language Modelling News Recommendation +3

Paper
Code

Neural Sentence Ordering Based on Constraint Graphs

1 code implementation • 27 Jan 2021 • Yutao Zhu, Kun Zhou, Jian-Yun Nie, Shengchao Liu, Zhicheng Dou

Our experiments on five benchmark datasets show that our method outperforms all the existing baselines significantly, achieving a new state-of-the-art performance.

Sentence Sentence Ordering

Paper
Code

Content Selection Network for Document-grounded Retrieval-based Chatbots

1 code implementation • 21 Jan 2021 • Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, Zhicheng Dou

It is thus crucial to select the part of document content relevant to the current conversation context.

Retrieval

Paper
Code

Pchatbot: A Large-Scale Dataset for Personalized Chatbot

2 code implementations • 28 Sep 2020 • Hongjin Qian, Xiaohe Li, Hanxun Zhong, Yu Guo, Yueyuan Ma, Yutao Zhu, Zhanliang Liu, Zhicheng Dou, Ji-Rong Wen

This enables the development of personalized dialogue models that directly learn implicit user personality from the user's dialogue history.

Chatbot

Paper
Code

ScriptWriter: Narrative-Guided Script Generation

1 code implementation • ACL 2020 • Yutao Zhu, Ruihua Song, Zhicheng Dou, Jian-Yun Nie, Jin Zhou

In dialogue systems, it would also be useful to drive dialogues by a dialogue plan.

Paper
Code

Personalizing Search Results Using Hierarchical RNN with Query-aware Attention

no code implementations • 20 Aug 2019 • Songwei Ge, Zhicheng Dou, Zhengbao Jiang, Jian-Yun Nie, Ji-Rong Wen

Our analysis reveals that the attention model is able to attribute higher weights to more related past sessions after fine training.

Attribute

Paper
Add Code

Improving Web Search Ranking by Incorporating Structured Annotation of Queries

no code implementations • EMNLP 2013 • Xiao Ding, Zhicheng Dou, Bing Qin, Ting Liu, Ji-Rong Wen

Information Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.