TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Translation	WMT2014 English-French	Hardware Aware Transformer	BLEU score	41.8	# 21
Machine Translation	WMT2014 English-German	Hardware Aware Transformer	BLEU score	28.4	# 44
Machine Translation	WMT2014 English-German	Hardware Aware Transformer	Number of Params	48M	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hat-hardware-aware-transformers-for-efficient/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=hat-hardware-aware-transformers-for-efficient)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hat-hardware-aware-transformers-for-efficient/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=hat-hardware-aware-transformers-for-efficient)`

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

ACL 2020 · Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han ·

Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search. We first construct a large design space with $\textit{arbitrary encoder-decoder attention}$ and $\textit{heterogeneous layers}$. Then we train a $\textit{SuperTransformer}$ that covers all candidates in the design space, and efficiently produces many $\textit{SubTransformers}$ with weight sharing. Finally, we perform an evolutionary search with a hardware latency constraint to find a specialized $\textit{SubTransformer}$ dedicated to run fast on the target hardware. Extensive experiments on four machine translation tasks demonstrate that HAT can discover efficient models for different hardware (CPU, GPU, IoT device). When running WMT'14 translation task on Raspberry Pi-4, HAT can achieve $\textbf{3}\times$ speedup, $\textbf{3.7}\times$ smaller size over baseline Transformer; $\textbf{2.7}\times$ speedup, $\textbf{3.6}\times$ smaller size over Evolved Transformer with $\textbf{12,041}\times$ less search cost and no performance loss. HAT code is https://github.com/mit-han-lab/hardware-aware-transformers.git

PDF Abstract ACL 2020 PDF ACL 2020 Abstract

Code

Add Remove Mark official

mit-han-lab/hardware-aware-transfor… official

322

Luccadoremi/Model-Compression-DAQ

aaditkapoor/PDFExtract

mlatsjsu/PDFextract

Tasks

Add Remove

Decoder

Machine Translation

Neural Architecture Search

Translation

Datasets

WMT 2014

Results from the Paper

Edit

Ranked #21 on Machine Translation on WMT2014 English-French

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Translation	WMT2014 English-French	Hardware Aware Transformer	BLEU score	41.8	# 21	Compare
Machine Translation	WMT2014 English-German	Hardware Aware Transformer	BLEU score	28.4	# 44	Compare
Machine Translation	WMT2014 English-German	Hardware Aware Transformer	Number of Params	48M	# 13	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove