Search Results for author: Zhaozhuo Xu

Found 23 papers, 3 papers with code

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

1 code implementation • 2 Mar 2024 • Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava

Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations.

16k Language Modelling +1

Paper
Code

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

1 code implementation • 5 Feb 2024 • Zirui Liu, Jiayi Yuan, Hongye Jin, Shaochen Zhong, Zhaozhuo Xu, Vladimir Braverman, Beidi Chen, Xia Hu

This memory demand increases with larger batch sizes and longer context lengths.

Quantization

113

Paper
Code

LLM Multi-Agent Systems: Challenges and Open Problems

no code implementations • 5 Feb 2024 • Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu, Chaoyang He

This paper explores existing works of multi-agent systems and identifies challenges that remain inadequately addressed.

Management

Paper
Add Code

LETA: Learning Transferable Attribution for Generic Vision Explainer

no code implementations • 23 Dec 2023 • Guanchu Wang, Yu-Neng Chuang, Fan Yang, Mengnan Du, Chia-Yuan Chang, Shaochen Zhong, Zirui Liu, Zhaozhuo Xu, Kaixiong Zhou, Xuanting Cai, Xia Hu

To address this problem, we develop a pre-trained, DNN-based, generic explainer on large-scale image datasets, and leverage its transferability to explain various vision models for downstream tasks.

Paper
Add Code

Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training

no code implementations • 23 Sep 2023 • Zhuang Wang, Zhaozhuo Xu, Anshumali Shrivastava, T. S. Eugene Ng

We then systematically explore the design space of communication schemes for sparse tensors and find the optimal one.

Paper
Add Code

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

1 code implementation • NeurIPS 2023 • Zirui Liu, Guanchu Wang, Shaochen Zhong, Zhaozhuo Xu, Daochen Zha, Ruixiang Tang, Zhimeng Jiang, Kaixiong Zhou, Vipin Chaudhary, Shuai Xu, Xia Hu

While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as activations, as they are crucial for gradient calculation.

Language Modelling Stochastic Optimization

Paper
Code

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

no code implementations • 17 May 2023 • Zhaozhuo Xu, Zirui Liu, Beidi Chen, Yuxin Tang, Jue Wang, Kaixiong Zhou, Xia Hu, Anshumali Shrivastava

Thus, optimizing this accuracy-efficiency trade-off is crucial for the LLM deployment on commodity hardware.

Model Compression Quantization

Paper
Add Code

A Theoretical Analysis Of Nearest Neighbor Search On Approximate Near Neighbor Graph

no code implementations • 10 Mar 2023 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu

Current theoretical literature focuses on greedy search on exact near neighbor graph while practitioners use approximate near neighbor graph (ANN-Graph) to reduce the preprocessing time.

Paper
Add Code

Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations

no code implementations • 21 Dec 2022 • Lianke Qin, Aravind Reddy, Zhao Song, Zhaozhuo Xu, Danyang Zhuo

In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation.

Paper
Add Code

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

no code implementations • 8 Aug 2022 • Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries.

Density Estimation

Paper
Add Code

Sublinear Time Algorithm for Online Weighted Bipartite Matching

no code implementations • 5 Aug 2022 • Hang Hu, Zhao Song, Runzhou Tao, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

Online bipartite matching is a fundamental problem in online algorithms.

Paper
Add Code

Proximity Graph Maintenance for Fast Online Nearest Neighbor Search

no code implementations • 22 Jun 2022 • Zhaozhuo Xu, Weijie Zhao, Shulong Tan, Zhixin Zhou, Ping Li

Given a vertex deletion request, we thoroughly investigate solutions to update the connections of the vertex.

Quantization Recommendation Systems

Paper
Add Code

Locality Sensitive Teaching

no code implementations • NeurIPS 2021 • Zhaozhuo Xu, Beidi Chen, Chaojian Li, Weiyang Liu, Le Song, Yingyan Lin, Anshumali Shrivastava

However, as one of the most influential and practical MT paradigms, iterative machine teaching (IMT) is prohibited on IoT devices due to its inefficient and unscalable algorithms.

Paper
Add Code

Raw Nav-merge Seismic Data to Subsurface Properties with MLP based Multi-Modal Information Unscrambler

no code implementations • NeurIPS 2021 • Aditya Desai, Zhaozhuo Xu, Menal Gupta, Anu Chandran, Antoine Vial-Aussavy, Anshumali Shrivastava

This paradigm breaks the SI into local inversion tasks, which predicts each small chunk of subsurface properties using surrounding seismic data.

Auxiliary Learning Seismic Inversion +1

Paper
Add Code

Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures

no code implementations • NeurIPS 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu

In this work, we focus on improving the per iteration cost of CGM.

Paper
Add Code

PairConnect: A Compute-Efficient MLP Alternative to Attention

no code implementations • 15 Jun 2021 • Zhaozhuo Xu, Minghao Yan, Junyan Zhang, Anshumali Shrivastava

The dot product self-attention in Transformer allows us to model interactions between words.

Language Modelling Word Embeddings

Paper
Add Code

Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing

no code implementations • 18 May 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu

We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Beyond Convolutions: A Novel Deep Learning Approach for Raw Seismic Data Ingestion

no code implementations • 26 Feb 2021 • Zhaozhuo Xu, Aditya Desai, Menal Gupta, Anu Chandran, Antoine Vial-Aussavy, Anshumali Shrivastava

We propose a fundamental shift to move away from convolutions and introduce SESDI: Set Embedding based SDI approach.

Irregular Time Series SSIM +1

Paper
Add Code

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training

no code implementations • ICLR 2021 • Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Re

Recent advances by practitioners in the deep learning community have breathed new life into Locality Sensitive Hashing (LSH), using it to reduce memory and time bottlenecks in neural network (NN) training.

Efficient Neural Network Language Modelling +2

Paper
Add Code

Climbing the WOL: Training for Cheaper Inference

no code implementations • 2 Jul 2020 • Zichang Liu, Zhaozhuo Xu, Alan Ji, Jonathan Li, Beidi Chen, Anshumali Shrivastava

Efficient inference for wide output layers (WOLs) is an essential yet challenging task in large scale machine learning.

Retrieval

Paper
Add Code

Möbius Transformation for Fast Inner Product Search on Graph

no code implementations • NeurIPS 2019 • Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, Ping Li

We present a fast search on graph algorithm for Maximum Inner Product Search (MIPS).

Paper
Add Code

On Efficient Retrieval of Top Similarity Vectors

no code implementations • IJCNLP 2019 • Shulong Tan, Zhixin Zhou, Zhaozhuo Xu, Ping Li

Retrieval of relevant vectors produced by representation learning critically influences the efficiency in natural language processing (NLP) tasks.

BIG-bench Machine Learning Representation Learning +1

Paper
Add Code

Fast Binary Functional Search on Graph

no code implementations • 27 Sep 2018 • Shulong Tan, Zhixin Zhou, Zhaozhuo Xu, Ping Li

As Approximate Nearest Neighbor Search (ANNS) techniques have specifications on metric distances, efficient searching by advanced measures is still an open question.

Open-Ended Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.