1 code implementation • 8 Feb 2024 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.
1 code implementation • 31 Dec 2023 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
no code implementations • 23 Oct 2023 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others.
1 code implementation • 24 May 2023 • Hao Sun, Xiao Liu, Yeyun Gong, Yan Zhang, Daxin Jiang, Linjun Yang, Nan Duan
With the advance of large language models (LLMs), the research field of LLM applications becomes more and more popular and the idea of constructing pipelines to accomplish complex tasks by stacking LLM API calls come true.
1 code implementation • 10 Apr 2023 • Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei
We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references.
1 code implementation • 10 Dec 2022 • Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan
Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model.
1 code implementation • 7 Dec 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.
Ranked #11 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)
1 code implementation • 31 Aug 2022 • Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang
In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency.
1 code implementation • 6 Jul 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei
It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training.
no code implementations • 21 Jan 2022 • Gabriella Kazai, Bhaskar Mitra, Anlei Dong, Nick Craswell, Linjun Yang
This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways.
no code implementations • ACL 2021 • Nan Yang, Furu Wei, Binxing Jiao, Daxing Jiang, Linjun Yang
Dense passage retrieval has been shown to be an effective approach for information retrieval tasks such as open domain question answering.
2 code implementations • 20 Jun 2020 • Jui-Ting Huang, ASHISH SHARMA, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, Linjun Yang
In this paper, we discuss the techniques for applying EBR to a Facebook Search system.
no code implementations • 14 Feb 2018 • Houdong Hu, Yan Wang, Linjun Yang, Pavel Komlev, Li Huang, Xi Chen, Jiapei Huang, Ye Wu, Meenaz Merchant, Arun Sacheti
In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing.
3 code implementations • CVPR 2018 • Kuang-Huei Lee, Xiaodong He, Lei Zhang, Linjun Yang
We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets.
Ranked #2 on Image Classification on Food-101N (using extra training data)