Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders
We introduce the deep inside-outside recursive autoencoder (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence. During training we use dynamic programming to consider all possible binary trees over the sentence, and for inference we use the CKY algorithm to extract the highest scoring parse. DIORA outperforms previously reported results for unsupervised binary constituency parsing on the benchmark WSJ dataset.
PDF AbstractCode
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Constituency Grammar Induction | PTB Diagnostic ECG Database | DIORA | Max F1 (WSJ) | 49.6 | # 13 | |
Mean F1 (WSJ10) | 67.7 | # 2 | ||||
Max F1 (WSJ10) | 68.5 | # 3 | ||||
Mean F1 (WSJ) | 48.9 | # 17 | ||||
Constituency Grammar Induction | PTB Diagnostic ECG Database | DIORA (+PP) | Max F1 (WSJ) | 56.2 | # 9 | |
Max F1 (WSJ10) | 60.55 | # 5 | ||||
Mean F1 (WSJ) | 55.7 | # 13 |