TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Dependency Parsing	Penn Treebank	Distilled neural FOG	POS	97.44	# 2
Dependency Parsing	Penn Treebank	Distilled neural FOG	UAS	94.26	# 18
Dependency Parsing	Penn Treebank	Distilled neural FOG	LAS	92.06	# 18

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/distilling-an-ensemble-of-greedy-dependency/dependency-parsing-on-penn-treebank)](https://paperswithcode.com/sota/dependency-parsing-on-penn-treebank?p=distilling-an-ensemble-of-greedy-dependency)`

Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

EMNLP 2016 · Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah A. Smith ·

We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a "distillation" of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.