no code implementations • EMNLP 2020 • Anders S{\o}gaard
Cross-language differences in (universal) dependency parsing performance are mostly attributed to treebank size, average sentence length, average dependency length, morphological complexity, and domain differences.
1 code implementation • Joint Conference on Lexical and Computational Semantics 2021 • Terne Sasha Thorn Jakobsen, Maria Barrett, Anders S{\o}gaard
Recent work in cross-topic argument mining attempts to learn models that generalise across topics rather than merely relying on within-topic spurious correlations.
no code implementations • ACL 2021 • Mark Anderson, Anders S{\o}gaard, Carlos G{\'o}mez-Rodr{\'\i}guez
S{\o}gaard (2020) obtained results suggesting the fraction of trees occurring in the test data isomorphic to trees in the training set accounts for a non-trivial variation in parser performance.
1 code implementation • EACL 2021 • Marcel Bollmann, Anders S{\o}gaard
We evaluate two common conjectures in error analysis of NLP models: (i) Morphology is predictive of errors; and (ii) the importance of morphology increases with the morphological complexity of a language.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Anders S{\o}gaard
Several approaches to neural speed reading have been presented at major NLP and machine learning conferences in 2017{--}20; i. e., {``}human-inspired{''} recurrent network architectures that learn to {``}read{''} text faster by skipping irrelevant words, typically optimizing the joint objective of minimizing classification error rate and FLOPs used at inference time.
no code implementations • LREC 2020 • Rasmus Hvingelby, Amalie Brogaard Pauli, Maria Barrett, Christina Rosted, Lasse Malm Lidegaard, Anders S{\o}gaard
We present a named entity annotation for the Danish Universal Dependencies treebank using the CoNLL-2003 annotation scheme: DaNE.
no code implementations • LREC 2020 • Cezar Sas, Meriem Beloucif, Anders S{\o}gaard
Frame-semantic annotations exist for a tiny fraction of the world{'}s languages, Wikidata, however, links knowledge base triples to texts in many languages, providing a common, distant supervision signal for semantic parsers.
no code implementations • IJCNLP 2019 • Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders S{\o}gaard
Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on.
no code implementations • RANLP 2019 • Meriem Beloucif, Ana Valeria Gonzalez, Marcel Bollmann, Anders S{\o}gaard
Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios.
no code implementations • WS 2019 • Simon Flachs, Oph{\'e}lie Lacroix, Anders S{\o}gaard
This paper describes our contribution to the low-resource track of the BEA 2019 shared task on Grammatical Error Correction (GEC).
no code implementations • ACL 2019 • Sebastian Ruder, Anders S{\o}gaard, Ivan Vuli{\'c}
In this tutorial, we provide a comprehensive survey of the exciting recent work on cutting-edge weakly-supervised and unsupervised cross-lingual word representations.
no code implementations • ACL 2019 • Simon Flachs, Marcel Bollmann, Anders S{\o}gaard
Training neural sequence-to-sequence models with simple token-level log-likelihood is now a standard approach to historical text normalization, albeit often outperformed by phrase-based models.
no code implementations • SEMEVAL 2019 • Ana Valeria Gonz{\'a}lez, Victor Petr{\'e}n Bach Hansen, Joachim Bingel, Anders S{\o}gaard
This work describes the system presented by the CoAStaL Natural Language Processing group at University of Copenhagen.
no code implementations • NAACL 2019 • Simon Flachs, Oph{\'e}lie Lacroix, Marek Rei, Helen Yannakoudakis, Anders S{\o}gaard
While rule-based detection of subject-verb agreement (SVA) errors is sensitive to syntactic parsing errors and irregularities and exceptions to the main rules, neural sequential labelers have a tendency to overfit their training data.
no code implementations • WS 2018 • Emma Kerinec, Chlo{\'e} Braud, Anders S{\o}gaard
This work aims to contribute to our understanding of \textit{when} multi-task learning through parameter sharing in deep neural networks leads to improvements over single-task learning.
no code implementations • WS 2018 • Ola R{\o}nning, Daniel Hardt, Anders S{\o}gaard
Sluicing resolution is the task of identifying the antecedent to a question ellipsis.
1 code implementation • CONLL 2018 • Maria Barrett, Joachim Bingel, Nora Hollenstein, Marek Rei, Anders S{\o}gaard
Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP.
no code implementations • WS 2018 • Jan Lukes, Anders S{\o}gaard
Sentiment analysis models often rely on training data that is several years old.
no code implementations • COLING 2018 • Joachim Bingel, Gustavo Paetzold, Anders S{\o}gaard
Most previous research in text simplification has aimed to develop generic solutions, assuming very homogeneous target audiences with consistent intra-group simplification needs.
no code implementations • WS 2018 • Marcel Bollmann, Anders S{\o}gaard, Joachim Bingel
Historical text normalization suffers from small datasets that exhibit high variance, and previous work has shown that multi-task learning can be used to leverage data from related problems in order to obtain more robust models.
no code implementations • WS 2018 • Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, Anders S{\o}gaard
Neural part-of-speech (POS) taggers are known to not perform well with little training data.
no code implementations • NAACL 2018 • Ola R{\o}nning, Daniel Hardt, Anders S{\o}gaard
Sluice resolution in English is the problem of finding antecedents of \textit{wh}-fronted ellipses.
no code implementations • NAACL 2018 • Maria Barrett, Ana Valeria Gonz{\'a}lez-Gardu{\~n}o, Lea Frermann, Anders S{\o}gaard
Even small dictionaries can improve the performance of unsupervised induction algorithms.
1 code implementation • WS 2017 • Anders S{\o}gaard
Syntactic annotation is costly and not available for the vast majority of the world{'}s languages.
no code implementations • WS 2017 • Ana Valeria Gonz{\'a}lez-Gardu{\~n}o, Anders S{\o}gaard
We show that text readability prediction improves significantly from hard parameter sharing with models predicting first pass duration, total fixation duration and regression duration.
no code implementations • EMNLP 2017 • Manaal Faruqui, Anders S{\o}gaard, Ivan Vuli{\'c}
With the increasing use of monolingual word vectors, there is a need for word vectors that can be used as efficiently across multiple languages as monolingually.
no code implementations • WS 2017 • Chlo{\'e} Braud, Anders S{\o}gaard
The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators.
1 code implementation • EMNLP 2017 • Chlo{\'e} Braud, Oph{\'e}lie Lacroix, Anders S{\o}gaard
Discourse segmentation is the first step in building discourse parsers.
no code implementations • WS 2017 • Bahar Salehi, Dirk Hovy, Eduard Hovy, Anders S{\o}gaard
Geolocation is the task of identifying a social media user{'}s primary location, and in natural language processing, there is a growing literature on to what extent automated analysis of social media posts can help.
no code implementations • WS 2017 • Bahar Salehi, Anders S{\o}gaard
Recent work in geolocation has made several hypotheses about what linguistic markers are relevant to detect where people write from.
no code implementations • ACL 2017 • Marcel Bollmann, Joachim Bingel, Anders S{\o}gaard
Automated processing of historical texts often relies on pre-normalization to modern word forms.
no code implementations • ACL 2017 • Chlo{\'e} Braud, Oph{\'e}lie Lacroix, Anders S{\o}gaard
Discourse segmentation is a crucial step in building end-to-end discourse parsers.
1 code implementation • EACL 2017 • Michael Schlichtkrull, Anders S{\o}gaard
In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding.
no code implementations • EACL 2017 • {\v{Z}}eljko Agi{\'c}, Barbara Plank, Anders S{\o}gaard
We address the challenge of cross-lingual POS tagger evaluation in absence of manually annotated test data.
1 code implementation • COLING 2016 • Chlo{\'e} Braud, Barbara Plank, Anders S{\o}gaard
We experiment with different ways of training LSTM networks to predict RST discourse trees.
Ranked #5 on
Discourse Parsing
on RST-DT
(RST-Parseval (Full) metric)
no code implementations • COLING 2016 • Maria Barrett, Frank Keller, Anders S{\o}gaard
Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models.
no code implementations • LREC 2016 • Bolette Pedersen, Anna Braasch, Anders Johannsen, H{\'e}ctor Mart{\'\i}nez Alonso, Sanni Nimb, Sussi Olsen, Anders S{\o}gaard, Nicolai Hartvig S{\o}rensen
The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish.
no code implementations • TACL 2016 • {\v{Z}}eljko Agi{\'c}, Anders Johannsen, Barbara Plank, H{\'e}ctor Mart{\'\i}nez Alonso, Natalie Schluter, Anders S{\o}gaard
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages.
no code implementations • LREC 2014 • Hege Fromreide, Dirk Hovy, Anders S{\o}gaard
We present two new NER datasets for Twitter; a manually annotated set of 1, 467 tweets (kappa=0. 942) and a set of 2, 975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010).
no code implementations • LREC 2014 • Dirk Hovy, Barbara Plank, Anders S{\o}gaard
We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21{\%}.
no code implementations • LREC 2012 • Sigrid Klerke, Anders S{\o}gaard
We compare DSim to different examples of monolingual parallel corpora, and we argue that this corpus is a promising basis for future development of automatic data-driven text simplification systems in Danish.