TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Speech Recognition	WSJ dev93	CTC-CRF ST-NAS	Word Error Rate (WER)	5.68	# 1
Speech Recognition	WSJ eval92	CTC-CRF ST-NAS	Word Error Rate (WER)	2.77	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-neural-architecture-search-for-end/speech-recognition-on-wsj-dev93)](https://paperswithcode.com/sota/speech-recognition-on-wsj-dev93?p=efficient-neural-architecture-search-for-end)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-neural-architecture-search-for-end/speech-recognition-on-wsj-eval92)](https://paperswithcode.com/sota/speech-recognition-on-wsj-eval92?p=efficient-neural-architecture-search-for-end)`

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

11 Nov 2020 · Huahuan Zheng, Keyu An, Zhijian Ou ·

Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed networks with learned, task-specific architectures. In contrast to early computational-demanding NAS methods, recent gradient-based NAS methods, e.g., DARTS (Differentiable ARchiTecture Search), SNAS (Stochastic NAS) and ProxylessNAS, significantly improve the NAS efficiency. In this paper, we make two contributions. First, we rigorously develop an efficient NAS method via Straight-Through (ST) gradients, called ST-NAS. Basically, ST-NAS uses the loss from SNAS but uses ST to back-propagate gradients through discrete variables to optimize the loss, which is not revealed in ProxylessNAS. Using ST gradients to support sub-graph sampling is a core element to achieve efficient NAS beyond DARTS and SNAS. Second, we successfully apply ST-NAS to end-to-end ASR. Experiments over the widely benchmarked 80-hour WSJ and 300-hour Switchboard datasets show that the ST-NAS induced architectures significantly outperform the human-designed architecture across the two datasets. Strengths of ST-NAS such as architecture transferability and low computation cost in memory and time are also reported.

PDF Abstract

Code

Add Remove Mark official

thu-spmi/ST-NAS official

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Graph Sampling

Neural Architecture Search

speech-recognition

Speech Recognition

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #1 on Speech Recognition on WSJ dev93

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Speech Recognition	WSJ dev93	CTC-CRF ST-NAS	Word Error Rate (WER)	5.68	# 1		Compare
Speech Recognition	WSJ eval92	CTC-CRF ST-NAS	Word Error Rate (WER)	2.77	# 5		Compare

Methods

Add Remove

Adam • Cutout • DARTS • DropPath • ProxylessNAS • REINFORCE

Edit Social Preview

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove