Search Results for author: Wenqi Jiang

Found 6 papers, 1 papers with code

PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design

no code implementations • 8 Mar 2024 • Wenqi Jiang, Shuai Zhang, Boran Han, Jie Wang, Bernie Wang, Tim Kraska

Retrieval-augmented generation (RAG) can enhance the generation quality of large language models (LLMs) by incorporating external token databases.

Retrieval

Paper
Add Code

Chameleon: a heterogeneous and disaggregated accelerator system for retrieval-augmented language models

no code implementations • 15 Oct 2023 • Wenqi Jiang, Marco Zeller, Roger Waleffe, Torsten Hoefler, Gustavo Alonso

The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements.

Language Modelling Retrieval +1

Paper
Add Code

Co-design Hardware and Algorithm for Vector Search

1 code implementation • 19 Jun 2023 • Wenqi Jiang, Shigang Li, Yu Zhu, Johannes De Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents.

Information Retrieval Retrieval

Paper
Code

Switch Spaces: Learning Product Spaces with Sparse Gating

no code implementations • 17 Feb 2021 • Shuai Zhang, Yi Tay, Wenqi Jiang, Da-Cheng Juan, Ce Zhang

In order for learned representations to be effective and efficient, it is ideal that the geometric inductive bias aligns well with the underlying structure of the data.

Inductive Bias Knowledge Graph Completion +1

Paper
Add Code

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

no code implementations • 12 Oct 2020 • Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, Gustavo Alonso

MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups.

Recommendation Systems

Paper
Add Code

Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning

no code implementations • 21 Mar 2020 • Shaoxiong Ji, Wenqi Jiang, Anwar Walid, Xue Li

Federated learning (FL) is a novel machine learning setting that enables on-device intelligence via decentralized training and federated optimization.

Federated Learning Image Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.