1 code implementation • 30 Jan 2025 • Yiteng Tu, Weihang Su, Yujia Zhou, Yiqun Liu, Qingyao Ai
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieved from a knowledge base.
1 code implementation • 27 Jan 2025 • Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu
To this end, we introduce Parametric retrieval-augmented generation (Parametric RAG), a new RAG paradigm that integrates external knowledge directly into the parameters of feed-forward networks (FFN) of an LLM through document parameterization.
1 code implementation • 9 Jan 2025 • Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou
To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents.
Ranked #1 on
Mathematical Reasoning
on MATH500
1 code implementation • 16 Dec 2024 • Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou
Moreover, to mitigate false pruning in the process of constrained evidence generation, we introduce (1) hierarchical FM-Index constraints, which generate corpus-constrained clues to identify a subset of relevant documents before evidence generation, reducing irrelevant decoding space; and (2) a forward-looking constrained decoding strategy, which considers the relevance of future sequences to improve evidence accuracy.
1 code implementation • 7 Dec 2024 • Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, Yiqun Liu
Finally, we provide a detailed analysis of the limitations of LLM judges and discuss potential future directions.
no code implementations • 30 Nov 2024 • Yan Wang, Jimin Huang, Huan He, Vincent Zhang, Yujia Zhou, Xubing Hao, Pritham Ram, Lingfei Qian, Qianqian Xie, Ruey-Ling Weng, Fongci Lin, Yan Hu, Licong Cui, Xiaoqian Jiang, Hua Xu, Na Hong
We propose CDEMapper, a large language model (LLM) powered mapping tool designed to assist in mapping local data elements to NIH CDEs.
1 code implementation • 15 Nov 2024 • Yan Hu, Xu Zuo, Yujia Zhou, Xueqing Peng, Jimin Huang, Vipina K. Keloth, Vincent J. Zhang, Ruey-Ling Weng, Qingyu Chen, Xiaoqian Jiang, Kirk E. Roberts, Hua Xu
On unseen i2b2 data, LLaMA-3-70B outperformed BERT by 7% (F1) on NER and 4% on RE.
1 code implementation • 11 Nov 2024 • Yujia Zhou, Zheng Liu, Zhicheng Dou
The emergence of Large Language Models (LLMs) has significantly advanced natural language processing, but these models often generate factually incorrect information, known as "hallucination".
1 code implementation • 20 Oct 2024 • Haitao Li, Junjie Chen, Qingyao Ai, Zhumin Chu, Yujia Zhou, Qian Dong, Yiqun Liu
The use of large language models (LLMs) as automated evaluation tools to assess the quality of generated natural language, known as LLMs-as-Judges, has demonstrated promising capabilities and is rapidly gaining widespread attention.
no code implementations • 1 Oct 2024 • Ziyi Ye, Xiangsheng Li, Qiuchi Li, Qingyao Ai, Yujia Zhou, Wei Shen, Dong Yan, Yiqun Liu
Conventionally, preference data is learned and encoded into a scalar reward model that connects a value head with an LLM to produce a scalar score as preference or reward.
1 code implementation • 16 Sep 2024 • Yujia Zhou, Yan Liu, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Zheng Liu, Chaozhuo Li, Zhicheng Dou, Tsung-Yi Ho, Philip S. Yu
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs).
no code implementations • 24 May 2024 • Hongjin Qian, Zheng Liu, Peitian Zhang, Kelong Mao, Yujia Zhou, Xu Chen, Zhicheng Dou
The learning and deployment of long-LLMs remains a challenging problem despite recent progresses.
1 code implementation • 23 Apr 2024 • Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, Zhicheng Dou
We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant.
2 code implementations • 11 Mar 2024 • Weihang Su, Changyue Wang, Qingyao Ai, Yiran Hu, Zhijing Wu, Yujia Zhou, Yiqun Liu
Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate.
no code implementations • 19 Feb 2024 • Jiajie Jin, Yutao Zhu, Yujia Zhou, Zhicheng Dou
Retrieval-augmented large language models (LLMs) have demonstrated efficacy in knowledge-intensive tasks such as open-domain QA, addressing inherent challenges in knowledge update and factual inadequacy.
1 code implementation • 18 Feb 2024 • Yujia Zhou, Zheng Liu, Jiajie Jin, Jian-Yun Nie, Zhicheng Dou
Drawing from cognitive psychology, metacognition allows an entity to self-reflect and critically evaluate its cognitive processes.
no code implementations • 16 Feb 2024 • Yujia Zhou, Qiannan Zhu, Jiajie Jin, Zhicheng Dou
To counter this limitation, personalized search has been developed to re-rank results based on user preferences derived from query logs.
no code implementations • 15 Feb 2024 • Hongjin Qian, Zheng Liu, Kelong Mao, Yujia Zhou, Zhicheng Dou
These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained.
no code implementations • 2 Feb 2024 • Xiaoxi Li, Zhicheng Dou, Yujia Zhou, Fangchao Liu
We design the following mechanisms to facilitate effective retrieval and generation, and improve the end-to-end effectiveness of KI tasks: (1) We develop a ranking-oriented DocID list generation strategy, which refines GR by directly learning from a DocID ranking list, to improve retrieval quality.
no code implementations • 18 Dec 2023 • Xiaoxi Li, Yujia Zhou, Zhicheng Dou
Generative information retrieval, encompassing two major tasks of Generative Document Retrieval (GDR) and Grounded Answer Generation (GAR), has gained significant attention in the area of information retrieval and natural language processing.
1 code implementation • 23 May 2023 • Peitian Zhang, Zheng Liu, Yujia Zhou, Zhicheng Dou, Fangchao Liu, Zhao Cao
On top of the term-set DocID, we propose a permutation-invariant decoding algorithm, with which the term set can be generated in any permutation yet will always lead to the corresponding document.
1 code implementation • 29 Mar 2023 • Yan Hu, Qingyu Chen, Jingcheng Du, Xueqing Peng, Vipina Kuttichi Keloth, Xu Zuo, Yujia Zhou, Zehan Li, Xiaoqian Jiang, Zhiyong Lu, Kirk Roberts, Hua Xu
Results: Using baseline prompts, GPT-3. 5 and GPT-4 achieved relaxed F1 scores of 0. 634, 0. 804 for MTSamples, and 0. 301, 0. 593 for VAERS.
no code implementations • 19 Aug 2022 • Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Peitian Zhang, Ji-Rong Wen
In order to unify these two stages, we explore a model-based indexer for document retrieval.
no code implementations • 1 Mar 2022 • Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Ji-Rong Wen
Web search provides a promising way for people to obtain information and has been extensively studied.
1 code implementation • 22 Feb 2022 • Yujia Zhou, Zhicheng Dou, Huaying Yuan, Zhengyi Ma
In this paper, we propose the model Socialformer, which introduces the characteristics of social networks into designing sparse attention patterns for long document modeling in document ranking.
1 code implementation • 24 Nov 2021 • Yujia Zhou, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen
Personalized search plays a crucial role in improving user search experience owing to its ability to build user profiles based on historical behaviors.
1 code implementation • 24 Nov 2021 • Yujia Zhou, Zhicheng Dou, Bingzheng Wei, Ruobing Xievand Ji-Rong Wen
Specifically, we propose a friend network enhanced personalized search model, which groups the user into multiple friend circles based on search behaviours and friend relations respectively.
1 code implementation • 3 Oct 2021 • Laila Rasmy, Jie Zhu, Zhiheng Li, Xin Hao, Hong Thoai Tran, Yujia Zhou, Firat Tiryaki, Yang Xiang, Hua Xu, Degui Zhi
As a result, deep learning models developed for sequence modeling, like recurrent neural networks (RNNs) are common architecture for EHR-based clinical events predictive models.
no code implementations • journal 2021 • Yujia Zhou, Chenru Zhao, Bingqiang Ji, and Hanliang Bo
Large bubbles always undergo asymmetric paths due to the wake instability, which was rarely considered in the simulation of bubbly flow in complex flow systems such as bubble columns.
1 code implementation • IEEE Transactions on Medical Imaging 2020 • Shumao Pang, Chunlan Pang, Lei Zhao, Yangfan Chen, Zhihai Su, Yujia Zhou, Meiyan Huang, Wei Yang, Hai Lu, Qianjin Feng
The SpineParseNet consists of a 3D graph convolutional segmentation network (GCSN) for 3D coarse segmentation and a 2D residual U-Net (ResUNet) for 2D segmentation refinement.
no code implementations • 13 Jul 2020 • Jingqi Wang, Noor Abu-el-rub, Josh Gray, Huy Anh Pham, Yujia Zhou, Frank Manion, Mei Liu, Xing Song, Hua Xu, Masoud Rouhizadeh, Yaoyun Zhang
To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text.
no code implementations • 16 Apr 2020 • Yujia Zhou, Shumao Pang, Jun Cheng, Yuhang Sun, Yi Wu, Lei Zhao, Yaqin Liu, Zhentai Lu, Wei Yang, Qianjin Feng
In fact, due to the limitation of the receptive field, the 3 x 3 kernel has difficulty in covering the corresponding features at high/original resolution.