Search Results for author: ra

Found 173 papers, 4 papers with code

Integrating Ethics into the NLP Curriculum

no code implementations ACL 2020 Emily M. Bender, Dirk Hovy, Alex Schofield, ra

To raise awareness among future NLP practitioners and prevent inertia in the field, we need to place ethics in the curriculum for all NLP students{---}not as an elective, but as a core part of their education.


SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese

no code implementations WS 2020 Nathan Hartmann, Gustavo Henrique Paetzold, S Alu{\'\i}sio, ra

Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easy to recruit.

Lexical Simplification

Findings of the Fourth Workshop on Neural Generation and Translation

no code implementations WS 2020 Kenneth Heafield, Hiroaki Hayashi, Yusuke Oda, Ioannis Konstas, Andrew Finch, Graham Neubig, Xi-An Li, Alex Birch, ra

We describe the finding of the Fourth Workshop on Neural Generation and Translation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2020).

Machine Translation NMT +1

Segmentation de texte non-supervis\'ee pour la d\'etection de th\'ematiques \`a l'aide de plongements lexicaux (Unsupervised text segmentation for topic detection using embeddings )

no code implementations JEPTALNRECITAL 2020 Alex Benamar, ra

Parmi les approches neuronales utilis{\'e}es, nous nous int{\'e}ressons tout particuli{\`e}rement {\`a} celles qui utilisent des plongements lexicaux pour repr{\'e}senter des phrases et d{\'e}finir des segments th{\'e}matiques.

Segmentation SENTER +1

PATE: A Corpus of Temporal Expressions for the In-car Voice Assistant Domain

no code implementations LREC 2020 Aless Zarcone, ra, Touhidul Alam, Zahra Kolagar

The recognition and automatic annotation of temporal expressions (e. g. {``}Add an event for tomorrow evening at eight to my calendar{''}) is a key module for AI voice assistants, in order to allow them to interact with apps (for example, a calendar app).

Building a Universal Dependencies Treebank for Occitan

no code implementations LREC 2020 Aleks Miletic, ra, Myriam Bras, Marianne Vergez-Couret, Louise Esher, Clamen{\c{c}}a Poujade, Jean Sibille

This paper outlines the ongoing effort of creating the first treebank for Occitan, a low-ressourced regional language spoken mainly in the south of France.


HBCP Corpus: A New Resource for the Analysis of Behavioural Change Intervention Reports

no code implementations LREC 2020 Francesca Bonin, Martin Gleize, Ailbhe Finnerty, C. Moore, ice, Charles Jochim, Emma Norris, Yufang Hou, Alison J. Wright, Debasis Ganguly, Emily Hayes, Silje Zink, Aless Pascale, ra, Pol Mac Aonghusa, Susan Michie

Due to the fast pace at which research reports in behaviour change are published, researchers, consultants and policymakers would benefit from more automatic ways to process these reports.

Towards the First Dyslexic Font in Russian

no code implementations LREC 2020 Svetlana Alexeeva, Aleks Dobrego, ra, Vladislav Zubov

However, often the text design process is focused on the font size, but not on its type; which might be crucial especially for the people with reading disabilities.


Multiword Expression aware Neural Machine Translation

no code implementations LREC 2020 Andrea Zaninello, Alex Birch, ra

Multiword Expressions (MWEs) are a frequently occurring phenomenon found in all natural languages that is of great importance to linguistic theory, natural language processing applications, and machine translation systems.

Data Augmentation Machine Translation +2

Investigating Multilingual Abusive Language Detection: A Cautionary Tale

no code implementations RANLP 2019 Kenneth Steimel, Daniel Dakota, Yue Chen, S K{\"u}bler, ra

Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.

Abusive Language

Reading KITTY: Pitch Range as an Indicator of Reading Skill

no code implementations WS 2019 Alfredo Gomez, Alicia Ngo, Aless Otondo, ra, Julie Medero

While affective outcomes are generally positive for the use of eBooks and computer-based reading tutors in teaching children to read, learning outcomes are often poorer (Korat and Shamir, 2004).

Participation d'EDF R\&D \`a DEFT 2019 : des vecteurs et des r\`egles ! (EDF R\&D submission to DEFT 2019 )

no code implementations JEPTALNRECITAL 2019 Philippe Suignard, Meryl Bothua, Alex Benamar, ra

Les m{\'e}thodes propos{\'e}es sont facilement transposables {\`a} d{'}autres t{\^a}ches d{'}indexation et de d{\'e}tection de similarit{\'e} qui peuvent int{\'e}resser plusieurs entit{\'e}s du groupe EDF.

Vers la traduction automatique d'adverbiaux temporels du fran\ccais vers la langue des signes fran\ccaise (Towards the automatic translation of temporal adverbials from French to French sign language)

no code implementations JEPTALNRECITAL 2019 S Bellato, ra

Nous pr{\'e}sentons ici de premiers travaux abordant la question de r{\`e}gles de passage entre deux formalismes d{\'e}crivant la s{\'e}mantique d{'}adverbiaux temporels respectivement pour le fran{\c{c}}ais et pour la Langue des Signes Fran{\c{c}}aise (LSF).

A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation

no code implementations NAACL 2019 Massimo Poesio, Jon Chamberlain, Silviu Paun, Juntao Yu, Alex Uma, ra, Udo Kruschwitz

The corpus, containing annotations for about 108, 000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2. 2M in total.

Coherence models in schizophrenia

no code implementations WS 2019 S Just, ra, Erik Haegert, Nora Ko{\v{r}}{\'a}nov{\'a}, Anna-Lena Br{\"o}cker, Ivan Nenchev, Jakob Funcke, Christiane Montag, Manfred Stede

Speech samples were obtained from healthy controls and patients with a diagnosis of schizophrenia or schizoaffective disorder and different severity of positive formal thought disorder.

Sentence Word Embeddings

Multilingual prediction of Alzheimer's disease through domain adaptation and concept-based language modelling

no code implementations NAACL 2019 Kathleen C. Fraser, Nicklas Linz, Bai Li, Kristina Lundholm Fors, Frank Rudzicz, Alex K{\"o}nig, ra, Alex, Jan ersson, Philippe Robert, Dimitrios Kokkinakis

There is growing evidence that changes in speech and language may be early markers of dementia, but much of the previous NLP work in this area has been limited by the size of the available datasets.

Domain Adaptation Language Modelling

Suicide Risk Assessment with Multi-level Dual-Context Language and BERT

no code implementations WS 2019 Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy Vu, Mohammad Zamani, Parth Limbachiya, Sharath Ch Guntuku, ra, H. Andrew Schwartz

Mental health predictive systems typically model language as if from a single context (e. g. Twitter posts, status updates, or forum posts) and often limited to a single level of analysis (e. g. either the message-level or user-level).

Readability of Twitter Tweets for Second Language Learners

no code implementations ALTA 2019 Patrick Jacob, Alex Uitdenbogerd, ra

Optimal language acquisition via reading requires the learners to read slightly above their current language skill level.

Language Acquisition

Measuring English Readability for Vietnamese Speakers

no code implementations ALTA 2019 Phuoc Nguyen, Alex Uitdenbogerd, ra

This study introduces a first approximation to readability of English text for VL1, with suggestions for further improvements.

Multi-source synthetic treebank creation for improved cross-lingual dependency parsing

2 code implementations WS 2018 Francis Tyers, Mariya Sheyanova, Aleks Martynova, ra, Pavel Stepachev, Konstantin Vinogorodskiy

This paper describes a method of creating synthetic treebanks for cross-lingual dependency parsing using a combination of machine translation (including pivot translation), annotation projection and the spanning tree algorithm.

Dependency Parsing Machine Translation +2

NLP-Cube: End-to-End Raw Text Processing With Neural Networks

1 code implementation CONLL 2018 Tiberiu Boros, Stefan Daniel Dumitrescu, Rux Burtica, ra

We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.

Lemmatization Sentence

Current and Future Psychological Health Prediction using Language and Socio-Demographics of Children for the CLPysch 2018 Shared Task

no code implementations WS 2018 Sharath Ch Guntuku, ra, Salvatore Giorgi, Lyle Ungar

The goal of the shared task was to use childhood language as a marker for both current and future psychological health over individual lifetimes.


Cross-corpus Native Language Identification via Statistical Embedding

no code implementations WS 2018 Francisco Rangel, Paolo Rosso, Julian Brooke, Alex Uitdenbogerd, ra

In this paper, we approach the task of native language identification in a realistic cross-corpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus.

Cross-corpus Native Language Identification

Anaphora Resolution with the ARRAU Corpus

no code implementations WS 2018 Massimo Poesio, Yulia Grishina, Varada Kolhatkar, Nafise Moosavi, Ina Roesiger, Adam Roussel, Fabian Simonjetz, Alex Uma, ra, Olga Uryupina, Juntao Yu, Heike Zinsmeister

The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference).

Towards Replicability in Parsing

no code implementations RANLP 2017 Daniel Dakota, S K{\"u}bler, ra

We investigate parsing replicability across 7 languages (and 8 treebanks), showing that choices concerning the use of grammatical functions in parsing or evaluation, the influence of the rare word threshold, as well as choices in test sentences and evaluation script options have considerable and often unexpected effects on parsing accuracies.

Quantifying the Effects of Text Duplication on Semantic Models

no code implementations EMNLP 2017 Alex Schofield, ra, Laure Thompson, David Mimno

Duplicate documents are a pervasive problem in text datasets and can have a strong effect on unsupervised models.

Similarity Based Genre Identification for POS Tagging Experts \& Dependency Parsing

no code implementations RANLP 2017 Atreyee Mukherjee, S K{\"u}bler, ra

The results show that the choice of similarity metric has an effect on results and that we can reach comparable accuracies to the joint topic modeling in POS tagging and dependency parsing, thus providing a viable and efficient approach to POS tagging and parsing a sentence by its genre expert.

Dependency Parsing Domain Adaptation +3

Non-Deterministic Segmentation for Chinese Lattice Parsing

no code implementations RANLP 2017 Hai Hu, Daniel Dakota, S K{\"u}bler, ra

Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses.

Morphological Analysis Segmentation +1

Investigating Diatopic Variation in a Historical Corpus

no code implementations WS 2017 Stefanie Dipper, S Waldenberger, ra

This paper investigates diatopic variation in a historical corpus of German.

Creating POS Tagging and Dependency Parsing Experts via Topic Modeling

no code implementations EACL 2017 Atreyee Mukherjee, S K{\"u}bler, ra, Matthias Scheutz

Part of speech (POS) taggers and dependency parsers tend to work well on homogeneous datasets but their performance suffers on datasets containing data from different genres.

Dependency Parsing Domain Adaptation +3

Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering

no code implementations WS 2017 Lilian Wanzare, Aless Zarcone, ra, Stefan Thater, Manfred Pinkal

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order.

Clustering Question Answering +1

Combining Heterogeneous User Generated Data to Sense Well-being

no code implementations COLING 2016 Adam Tsakalidis, Maria Liakata, Theo Damoulas, Brigitte Jellinek, Weisi Guo, Alex Cristea, ra

In this paper we address a new problem of predicting affect and well-being scales in a real-world setting of heterogeneous, longitudinal and non-synchronous textual as well as non-linguistic data that can be harvested from on-line media and mobile phones.

Emotion Recognition

Mise au point d'une m\'ethode d'annotation morphosyntaxique fine du serbe (Developping a method for detailed morphosyntactic tagging of Serbian)

no code implementations JEPTALNRECITAL 2016 Aleks Miletic, ra, C{\'e}cile Fabre, Dejan Stosic

Cet article pr{\'e}sente une exp{\'e}rience d{'}annotation morphosyntaxique fine du volet serbe du corpus parall{\`e}le ParCoLab (corpus serbe-fran{\c{c}}ais-anglais).


Relation- and Phrase-level Linking of FrameNet with Sar-graphs

no code implementations LREC 2016 Aleks Gabryszak, ra, Sebastian Krause, Leonhard Hennig, Feiyu Xu, Hans Uszkoreit

Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data.

Knowledge Graphs Relation +1

A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge

no code implementations LREC 2016 Lilian D. A. Wanzare, Aless Zarcone, ra, Stefan Thater, Manfred Pinkal

Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing).


The COPLE2 corpus: a learner corpus for Portuguese

no code implementations LREC 2016 Am{\'a}lia Mendes, S Antunes, ra, Maarten Janssen, Anabela Gon{\c{c}}alves

We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language.

Lemmatization POS

A Sequence Model Approach to Relation Extraction in Portuguese

no code implementations LREC 2016 S Collovini, ra, Gabriel Machado, Renata Vieira

The task of Relation Extraction from texts is one of the main challenges in the area of Information Extraction, considering the required linguistic knowledge and the sophistication of the language processing techniques employed.

Relation Relation Extraction

Rule-based Automatic Multi-word Term Extraction and Lemmatization

no code implementations LREC 2016 Ranka Stankovi{\'c}, Cvetana Krstev, Ivan Obradovi{\'c}, Biljana Lazi{\'c}, Aleks Trtovac, ra

In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms.

LEMMA Lemmatization +2

Un syst\`eme expert fond\'e sur une analyse s\'emantique pour l'identification de menaces d'ordre biologique

no code implementations JEPTALNRECITAL 2015 C{\'e}dric Lopez, Aleks Ponomareva, ra, C{\'e}cile Robin, Andr{\'e} Bittar, Xabier Larrucea, Fr{\'e}d{\'e}rique Segond, Marie-H{\'e}l{\`e}ne Metzger

Le projet europ{\'e}en TIER (Integrated strategy for CBRN {--} Chemical, Biological, Radiological and Nuclear {--} Threat Identification and Emergency Response) vise {\`a} int{\'e}grer une strat{\'e}gie compl{\`e}te et int{\'e}gr{\'e}e pour la r{\'e}ponse d{'}urgence dans un contexte de dangers biologiques, chimiques, radiologiques, nucl{\'e}aires, ou li{\'e}s aux explosifs, bas{\'e}e sur l{'}identification des menaces et d{'}{\'e}valuation des risques.

Classification d'entit\'es nomm\'ees de type film

no code implementations JEPTALNRECITAL 2015 Olivier Collin, Aleks Guerraz, ra

Pour ce faire, nous combinons deux approches : nous partons d{'}un syst{\`e}me {\`a} base de r{\`e}gles, qui pr{\'e}sente une bonne pr{\'e}cision, que nous couplons avec un mod{\`e}le de langage permettant d{'}augmenter le rappel.

Classification General Classification +1

Entre \'ecrit et oral ? Analyse compar\'ee de conversations de type tchat et de conversations t\'el\'ephoniques dans un centre de contact client

no code implementations JEPTALNRECITAL 2015 G{\'e}raldine Damnati, Aleks Guerraz, ra, Delphine Charlet

L{'}{\'e}tude parall{\`e}le de transcriptions de conversations t{\'e}l{\'e}phoniques issues d{'}un centre d{'}appel dans le m{\^e}me domaine de l{'}assistance permet d{'}{\'e}tablir des comparaisons entre ces deux modes d{'}interaction.


Discosuite - A parser test suite for German discontinuous structures

no code implementations LREC 2014 Wolfgang Maier, Miriam Kaeshammer, Peter Baumann, S K{\"u}bler, ra

However, for the evaluation of parser performance concerning a particular phenomenon, a test suite of sentences is needed in which this phenomenon has been identified.

Benchmarking Constituency Parsing +1

Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners

no code implementations LREC 2014 Lianet Sep{\'u}lveda Torres, Magali Sanches Duran, S Alu{\'\i}sio, ra

Aiming to inform a module of a system designed to support scientific written production of Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing language transfer errors made by such foreign learners.


An evaluation of the role of statistical measures and frequency for MWE identification

no code implementations LREC 2014 S Antunes, ra, Am{\'a}lia Mendes

We report on an experiment to evaluate the role of statistical association measures and frequency for the identification of MWE.