no code implementations • 19 Dec 2024 • Zhen Zheng, Xiaonan Song, Chuanjie Liu
We present the sweet spot of quantization configuration of algorithm-system co-design that leads to high accuracy and system efficiency.
no code implementations • 29 Nov 2024 • Zhen Zheng, Xin Ji, Taosong Fang, Fanghao Zhou, Chuanjie Liu, Gang Peng
The existing solutions use the LRU-based cache to reuse the KV context of common prefix between requests.
1 code implementation • 13 May 2024 • Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik, Harsha Vardhan Simhadri, Manik Varma, Yujing Wang, Linjun Yang, Mao Yang, Ce Zhang
Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals.
1 code implementation • NeurIPS 2021 • Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, Jingdong Wang
It stores the centroid points of the posting lists in the memory and the large posting lists in the disk.
1 code implementation • NeurIPS 2021 • Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, Jingdong Wang
It stores the centroid points of the posting lists in the memory and the large posting lists in the disk.
1 code implementation • 10 Jul 2020 • Xuan Shan, Chuanjie Liu, Yiqian Xia, Qi Chen, Yusi Zhang, Kaize Ding, Yaobo Liang, Angen Luo, Yuxiang Luo
Deep matching models aim to facilitate search engines retrieving more relevant documents by mapping queries and documents into semantic vectors in the first-stage retrieval.
no code implementations • 3 Jul 2020 • Yusi Zhang, Chuanjie Liu, Angen Luo, Hui Xue, Xuan Shan, Yuxiang Luo, Yiqian Xia, Yuanchi Yan, Haidong Wang
The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space.