The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities.
Distributional models that learn rich semantic word representations are a success story of recent NLP research.
The BiLSTM is trained jointly with the parser objective, resulting in very effective feature extractors for parsing.
#7 best model for Dependency Parsing on Penn Treebank
Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora.
(ii) We propose three attention schemes that integrate mutual influence between sentences into CNN; thus, the representation of each sentence takes into consideration its counterpart.