Search Results for author: Linjun Yang

Found 17 papers, 11 papers with code

Multilingual E5 Text Embeddings: A Technical Report

1 code implementation8 Feb 2024 Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.

Improving Text Embeddings with Large Language Models

2 code implementations31 Dec 2023 Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.

Decoder Diversity

Large Search Model: Redefining Search Stack in the Era of LLMs

no code implementations23 Oct 2023 Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others.

Language Modelling Large Language Model +3

Allies: Prompting Large Language Model with Beam Search

1 code implementation24 May 2023 Hao Sun, Xiao Liu, Yeyun Gong, Yan Zhang, Daxin Jiang, Linjun Yang, Nan Duan

With the advance of large language models (LLMs), the research field of LLM applications becomes more and more popular and the idea of constructing pipelines to accomplish complex tasks by stacking LLM API calls come true.

Language Modelling Large Language Model +3

Inference with Reference: Lossless Acceleration of Large Language Models

2 code implementations10 Apr 2023 Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei

We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references.

Decoder Language Modelling +1

LEAD: Liberal Feature-based Distillation for Dense Retrieval

1 code implementation10 Dec 2022 Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model.

Document Ranking Knowledge Distillation +2

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

1 code implementation31 Aug 2022 Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang

In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency.

Decoder Language Modelling +2

SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval

1 code implementation6 Jul 2022 Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training.

Language Modelling Passage Retrieval +1

Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

no code implementations21 Jan 2022 Gabriella Kazai, Bhaskar Mitra, Anlei Dong, Nick Craswell, Linjun Yang

This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways.

Information Retrieval Retrieval

Web-Scale Responsive Visual Search at Bing

no code implementations14 Feb 2018 Houdong Hu, Yan Wang, Linjun Yang, Pavel Komlev, Li Huang, Xi Chen, Jiapei Huang, Ye Wu, Meenaz Merchant, Arun Sacheti

In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing.

Learning-To-Rank

CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

3 code implementations CVPR 2018 Kuang-Huei Lee, Xiaodong He, Lei Zhang, Linjun Yang

We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets.

Ranked #2 on Image Classification on Food-101N (using extra training data)

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.