Search Results for author: Xiaolong Huang

Found 10 papers, 8 papers with code

Multilingual E5 Text Embeddings: A Technical Report

1 code implementation8 Feb 2024 Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.

One Step Learning, One Step Review

1 code implementation19 Jan 2024 Xiaolong Huang, Qiankun Li, Xueran Li, Xuesong Gao

Visual fine-tuning has garnered significant attention with the rise of pre-trained vision models.

Image Classification Instance Segmentation +3

Improving Text Embeddings with Large Language Models

2 code implementations31 Dec 2023 Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.

Decoder Diversity

Large Search Model: Redefining Search Stack in the Era of LLMs

no code implementations23 Oct 2023 Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others.

Language Modelling Large Language Model +3

Effective and Efficient Query-aware Snippet Extraction for Web Search

1 code implementation17 Oct 2022 Jingwei Yi, Fangzhao Wu, Chuhan Wu, Xiaolong Huang, Binxing Jiao, Guangzhong Sun, Xing Xie

In this paper, we propose an effective query-aware webpage snippet extraction method named DeepQSE, aiming to select a few sentences which can best summarize the webpage content in the context of input query.

Sentence

2nd Place Solution to Google Universal Image Embedding

1 code implementation17 Oct 2022 Xiaolong Huang, Qiankun Li

Image representations are a critical building block of computer vision applications.

Fine-Grained Image Classification

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

1 code implementation31 Aug 2022 Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang

In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency.

Decoder Language Modelling +2

SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval

1 code implementation6 Jul 2022 Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training.

Language Modelling Passage Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.