Search Results for author: Anders S{\o}gaard

Found 91 papers, 8 papers with code

Some Languages Seem Easier to Parse Because Their Treebanks Leak

no code implementations EMNLP 2020 Anders S{\o}gaard

Cross-language differences in (universal) dependency parsing performance are mostly attributed to treebank size, average sentence length, average dependency length, morphological complexity, and domain differences.

Dependency Parsing Test

Replicating and Extending ``Because Their Treebanks Leak'': Graph Isomorphism, Covariants, and Parser Performance

no code implementations ACL 2021 Mark Anderson, Anders S{\o}gaard, Carlos G{\'o}mez-Rodr{\'\i}guez

S{\o}gaard (2020) obtained results suggesting the fraction of trees occurring in the test data isomorphic to trees in the training set accounts for a non-trivial variation in parser performance.

Spurious Correlations in Cross-Topic Argument Mining

1 code implementation Joint Conference on Lexical and Computational Semantics 2021 Terne Sasha Thorn Jakobsen, Maria Barrett, Anders S{\o}gaard

Recent work in cross-topic argument mining attempts to learn models that generalise across topics rather than merely relying on within-topic spurious correlations.

Argument Mining Topic Models

Error Analysis and the Role of Morphology

1 code implementation EACL 2021 Marcel Bollmann, Anders S{\o}gaard

We evaluate two common conjectures in error analysis of NLP models: (i) Morphology is predictive of errors; and (ii) the importance of morphology increases with the morphological complexity of a language.

Neural Speed Reading Audited

no code implementations Findings of the Association for Computational Linguistics 2020 Anders S{\o}gaard

Several approaches to neural speed reading have been presented at major NLP and machine learning conferences in 2017{--}20; i. e., {``}human-inspired{''} recurrent network architectures that learn to {``}read{''} text faster by skipping irrelevant words, typically optimizing the joint objective of minimizing classification error rate and FLOPs used at inference time.

Classification Document Classification

WikiBank: Using Wikidata to Improve Multilingual Frame-Semantic Parsing

no code implementations LREC 2020 Cezar Sas, Meriem Beloucif, Anders S{\o}gaard

Frame-semantic annotations exist for a tiny fraction of the world{'}s languages, Wikidata, however, links knowledge base triples to texts in many languages, providing a common, distant supervision signal for semantic parsers.

Cross-Lingual Transfer Semantic Parsing

Adversarial Removal of Demographic Attributes Revisited

no code implementations IJCNLP 2019 Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders S{\o}gaard

Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on.

Noisy Channel for Low Resource Grammatical Error Correction

no code implementations WS 2019 Simon Flachs, Oph{\'e}lie Lacroix, Anders S{\o}gaard

This paper describes our contribution to the low-resource track of the BEA 2019 shared task on Grammatical Error Correction (GEC).

Grammatical Error Correction Language Modelling

Historical Text Normalization with Delayed Rewards

no code implementations ACL 2019 Simon Flachs, Marcel Bollmann, Anders S{\o}gaard

Training neural sequence-to-sequence models with simple token-level log-likelihood is now a standard approach to historical text normalization, albeit often outperformed by phrase-based models.

reinforcement-learning Reinforcement Learning (RL)

Unsupervised Cross-Lingual Representation Learning

no code implementations ACL 2019 Sebastian Ruder, Anders S{\o}gaard, Ivan Vuli{\'c}

In this tutorial, we provide a comprehensive survey of the exciting recent work on cutting-edge weakly-supervised and unsupervised cross-lingual word representations.

Representation Learning Structured Prediction

A Simple and Robust Approach to Detecting Subject-Verb Agreement Errors

no code implementations NAACL 2019 Simon Flachs, Oph{\'e}lie Lacroix, Marek Rei, Helen Yannakoudakis, Anders S{\o}gaard

While rule-based detection of subject-verb agreement (SVA) errors is sensitive to syntactic parsing errors and irregularities and exceptions to the main rules, neural sequential labelers have a tendency to overfit their training data.

When does deep multi-task learning work for loosely related document classification tasks?

no code implementations WS 2018 Emma Kerinec, Chlo{\'e} Braud, Anders S{\o}gaard

This work aims to contribute to our understanding of \textit{when} multi-task learning through parameter sharing in deep neural networks leads to improvements over single-task learning.

Document Classification General Classification +5

Sequence Classification with Human Attention

1 code implementation CONLL 2018 Maria Barrett, Joachim Bingel, Nora Hollenstein, Marek Rei, Anders S{\o}gaard

Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP.

Abusive Language Classification +4

Lexi: A tool for adaptive, personalized text simplification

no code implementations COLING 2018 Joachim Bingel, Gustavo Paetzold, Anders S{\o}gaard

Most previous research in text simplification has aimed to develop generic solutions, assuming very homogeneous target audiences with consistent intra-group simplification needs.

Lexical Simplification Text Simplification

Multi-task learning for historical text normalization: Size matters

no code implementations WS 2018 Marcel Bollmann, Anders S{\o}gaard, Joachim Bingel

Historical text normalization suffers from small datasets that exhibit high variance, and previous work has shown that multi-task learning can be used to leverage data from related problems in order to obtain more robust models.

Grammatical Error Correction Multi-Task Learning +1

Using hyperlinks to improve multilingual partial parsers

1 code implementation WS 2017 Anders S{\o}gaard

Syntactic annotation is costly and not available for the vast majority of the world{'}s languages.

Machine Translation Speech Synthesis

Cross-Lingual Word Representations: Induction and Evaluation

no code implementations EMNLP 2017 Manaal Faruqui, Anders S{\o}gaard, Ivan Vuli{\'c}

With the increasing use of monolingual word vectors, there is a need for word vectors that can be used as efficiently across multiple languages as monolingually.

Multilingual Word Embeddings

Using Gaze to Predict Text Readability

no code implementations WS 2017 Ana Valeria Gonz{\'a}lez-Gardu{\~n}o, Anders S{\o}gaard

We show that text readability prediction improves significantly from hard parameter sharing with models predicting first pass duration, total fixation duration and regression duration.

Machine Translation Multi-Task Learning +3

Is writing style predictive of scientific fraud?

no code implementations WS 2017 Chlo{\'e} Braud, Anders S{\o}gaard

The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators.

Logical Reasoning

Evaluating hypotheses in geolocation on a very large sample of Twitter

no code implementations WS 2017 Bahar Salehi, Anders S{\o}gaard

Recent work in geolocation has made several hypotheses about what linguistic markers are relevant to detect where people write from.

Fraud Detection

Huntsville, hospitals, and hockey teams: Names can reveal your location

no code implementations WS 2017 Bahar Salehi, Dirk Hovy, Eduard Hovy, Anders S{\o}gaard

Geolocation is the task of identifying a social media user{'}s primary location, and in natural language processing, there is a growing literature on to what extent automated analysis of social media posts can help.

Knowledge Base Population Recommendation Systems +1

Cross-lingual tagger evaluation without test data

no code implementations EACL 2017 {\v{Z}}eljko Agi{\'c}, Barbara Plank, Anders S{\o}gaard

We address the challenge of cross-lingual POS tagger evaluation in absence of manually annotated test data.

POS Test

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages

1 code implementation EACL 2017 Michael Schlichtkrull, Anders S{\o}gaard

In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding.

Dependency Parsing

Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features

no code implementations COLING 2016 Maria Barrett, Frank Keller, Anders S{\o}gaard

Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models.

Cross-Lingual Transfer POS +1

The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories

no code implementations LREC 2016 Bolette Pedersen, Anna Braasch, Anders Johannsen, H{\'e}ctor Mart{\'\i}nez Alonso, Sanni Nimb, Sussi Olsen, Anders S{\o}gaard, Nicolai Hartvig S{\o}rensen

The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish.


When POS data sets don't add up: Combatting sample bias

no code implementations LREC 2014 Dirk Hovy, Barbara Plank, Anders S{\o}gaard

We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21{\%}.


Crowdsourcing and annotating NER for Twitter \#drift

no code implementations LREC 2014 Hege Fromreide, Dirk Hovy, Anders S{\o}gaard

We present two new NER datasets for Twitter; a manually annotated set of 1, 467 tweets (kappa=0. 942) and a set of 2, 975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010).


DSim, a Danish Parallel Corpus for Text Simplification

no code implementations LREC 2012 Sigrid Klerke, Anders S{\o}gaard

We compare DSim to different examples of monolingual parallel corpora, and we argue that this corpus is a promising basis for future development of automatic data-driven text simplification systems in Danish.

Machine Translation Text Generation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.