TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Constituency Grammar Induction	PTB Diagnostic ECG Database	S-DIORA	Max F1 (WSJ)	63.96	# 3
Constituency Grammar Induction	PTB Diagnostic ECG Database	S-DIORA	Max F1 (WSJ10)	71.8	# 1
Constituency Grammar Induction	PTB Diagnostic ECG Database	S-DIORA	Mean F1 (WSJ)	57.6	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unsupervised-parsing-with-s-diora-single-tree/constituency-grammar-induction-on-ptb)](https://paperswithcode.com/sota/constituency-grammar-induction-on-ptb?p=unsupervised-parsing-with-s-diora-single-tree)`

Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders

EMNLP 2020 · Andrew Drozdov, Subendhu Rongali, Yi-Pei Chen, Tim O{'}Gorman, Mohit Iyyer, Andrew McCallum ·

The deep inside-outside recursive autoencoder (DIORA; Drozdov et al. 2019) is a self-supervised neural model that learns to induce syntactic tree structures for input sentences *without access to labeled training data*. In this paper, we discover that while DIORA exhaustively encodes all possible binary trees of a sentence with a soft dynamic program, its vector averaging approach is locally greedy and cannot recover from errors when computing the highest scoring parse tree in bottom-up chart parsing. To fix this issue, we introduce S-DIORA, an improved variant of DIORA that encodes a single tree rather than a softly-weighted mixture of trees by employing a hard argmax operation and a beam at each cell in the chart. Our experiments show that through *fine-tuning* a pre-trained DIORA with our new algorithm, we improve the state of the art in *unsupervised* constituency parsing on the English WSJ Penn Treebank by 2.2-6{\%} F1, depending on the data used for fine-tuning.

PDF Abstract