Search Results for author: Ofir Zafrir

Found 4 papers, 4 papers with code

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

1 code implementation • 28 Jun 2023 • Haihao Shen, Hengyu Meng, Bo Dong, Zhe Wang, Ofir Zafrir, Yi Ding, Yu Luo, Hanwen Chang, Qun Gao, Ziheng Wang, Guy Boudoukh, Moshe Wasserblat

We apply our sparse accelerator on widely-used Transformer-based language models including Bert-Mini, DistilBERT, Bert-Base, and BERT-Large.

Model Compression

1,914

Paper
Code

Fast DistilBERT on CPUs

2 code implementations • 27 Oct 2022 • Haihao Shen, Ofir Zafrir, Bo Dong, Hengyu Meng, Xinyu Ye, Zhe Wang, Yi Ding, Hanwen Chang, Guy Boudoukh, Moshe Wasserblat

In this work, we propose a new pipeline for creating and running Fast Transformer models on CPUs, utilizing hardware-aware pruning, knowledge distillation, quantization, and our own Transformer inference runtime engine with optimized kernels for sparse and quantized operators.

Knowledge Distillation Model Compression +2

1,938

Paper
Code

Prune Once for All: Sparse Pre-Trained Language Models

2 code implementations • 10 Nov 2021 • Ofir Zafrir, Ariel Larey, Guy Boudoukh, Haihao Shen, Moshe Wasserblat

We show how the compressed sparse pre-trained models we trained transfer their knowledge to five different downstream natural language tasks with minimal accuracy loss.

Ranked #2 on Natural Language Inference on MultiNLI Dev

Natural Language Inference Quantization +3

1,914

Paper
Code

Q8BERT: Quantized 8Bit BERT

5 code implementations • 14 Oct 2019 • Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat

Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks.

Ranked #13 on Semantic Textual Similarity on STS Benchmark

Linguistic Acceptability Natural Language Inference +3

2,928

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.