Dialogue summarization comes with its own peculiar challenges as opposed to news or scientific articles summarization.
This calls for a study of the impact of pretraining data size on the knowledge of the models.
Relating entities and events in text is a key component of natural language understanding.
The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations.
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm.
Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success.
Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts.
(2) Capturing the long-range dependency, specifically, the connection between an event trigger and a distant event argument.
The phenomenon that a quantum particle propagating in a detector, such as a Wilson cloud chamber, leaves a track close to a classical trajectory is analyzed.
Mathematical Physics Mathematical Physics Quantum Physics
In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations.
Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora.
Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs.
Ranked #8 on AMR Parsing on LDC2017T10
We deploy the methods of controlled psycholinguistic experimentation to shed light on the extent to which the behavior of neural network language models reflects incremental representations of syntactic state.
State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success.
When ablating the forward LSTM, performance drops less dramatically and composition recovers a substantial part of the gap, indicating that a forward LSTM and composition capture similar information.
This paper presents the IBM Research AI submission to the CoNLL 2018 Shared Task on Parsing Universal Dependencies.
This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018.
Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner.
Videogame streaming platforms have become a paramount example of noisy user-generated text.
During training, dynamic oracles alternate between sampling parser states from the training data and from the model as it is being learned, making the model more robust to the kinds of errors that will be made at test time.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection.
Ranked #17 on Constituency Parsing on Penn Treebank
We introduce two first-order graph-based dependency parsers achieving a new state of the art.
Ranked #16 on Dependency Parsing on Penn Treebank
Experiments on five languages show that feature selection can result in more compact models as well as higher accuracy under all conditions, but also that a dynamic ordering works better than a static ordering and that joint systems benefit more than standalone taggers.
We adapt the greedy Stack-LSTM dependency parser of Dyer et al. (2015) to support a training-with-exploration procedure using dynamic oracles(Goldberg and Nivre, 2013) instead of cross-entropy minimization.
Ranked #2 on Chinese Dependency Parsing on Chinese Pennbank
State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available.
Ranked #6 on Named Entity Recognition on CoNLL++
We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure.
Ranked #22 on Constituency Parsing on Penn Treebank
We train one multilingual model for dependency parsing and use it to parse sentences in several languages.
We present extensions to a continuous-state dependency parsing method that makes it applicable to morphologically rich languages.
We propose a technique for learning representations of parser states in transition-based dependency parsers.
Freely available statistical parsers often require careful optimization to produce state-of-the-art results, which can be a non-trivial task especially for application developers who are not interested in parsing research for its own sake.