no code implementations • WS 2020 • Nathan Hartmann, Gustavo Henrique Paetzold, S Alu{\'\i}sio, ra
Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easy to recruit.
no code implementations • WS 2020 • Kenneth Heafield, Hiroaki Hayashi, Yusuke Oda, Ioannis Konstas, Andrew Finch, Graham Neubig, Xi-An Li, Alex Birch, ra
We describe the finding of the Fourth Workshop on Neural Generation and Translation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2020).
no code implementations • ACL 2020 • Emily M. Bender, Dirk Hovy, Alex Schofield, ra
To raise awareness among future NLP practitioners and prevent inertia in the field, we need to place ethics in the curriculum for all NLP students{---}not as an elective, but as a core part of their education.
no code implementations • JEPTALNRECITAL 2020 • Danrun Cao, Alex Benamar, ra, Manel Boumghar, Meryl Bothua, Lydia Ould Ouali, Philippe Suignard
Ce papier d{\'e}crit la participation d{'}EDF R{\&}D {\`a} la campagne d{'}{\'e}valuation DEFT 2020.
no code implementations • JEPTALNRECITAL 2020 • Alex Benamar, ra
Parmi les approches neuronales utilis{\'e}es, nous nous int{\'e}ressons tout particuli{\`e}rement {\`a} celles qui utilisent des plongements lexicaux pour repr{\'e}senter des phrases et d{\'e}finir des segments th{\'e}matiques.
no code implementations • LREC 2020 • Susie Coleman, Andrew Secker, Rachel Bawden, Barry Haddow, Alex Birch, ra
A growth in news sources makes this increasingly challenging and time-consuming but MT can help automate some aspects of this process.
no code implementations • LREC 2020 • Aless Zarcone, ra, Touhidul Alam, Zahra Kolagar
The recognition and automatic annotation of temporal expressions (e. g. {``}Add an event for tomorrow evening at eight to my calendar{''}) is a key module for AI voice assistants, in order to allow them to interact with apps (for example, a calendar app).
no code implementations • LREC 2020 • Zuoyu Tian, S K{\"u}bler, ra
In this study, we investigate the use of Brown clustering for offensive language detection.
no code implementations • LREC 2020 • Aleks Miletic, ra, Myriam Bras, Marianne Vergez-Couret, Louise Esher, Clamen{\c{c}}a Poujade, Jean Sibille
This paper outlines the ongoing effort of creating the first treebank for Occitan, a low-ressourced regional language spoken mainly in the south of France.
no code implementations • LREC 2020 • Rosa Estop{\`a}, Alej L{\'o}pez-Fuentes, ra, Jorge M. Porras-Garzon
In this paper we focus on the patient{'}s lack of comprehension of medical reports.
no code implementations • LREC 2020 • Edresson Casanova, Marcos Treviso, Lilian H{\"u}bner, S Alu{\'\i}sio, ra
Automatic analysis of connected speech by natural language processing techniques is a promising direction for diagnosing cognitive impairments.
no code implementations • LREC 2020 • Svetlana Alexeeva, Aleks Dobrego, ra, Vladislav Zubov
However, often the text design process is focused on the font size, but not on its type; which might be crucial especially for the people with reading disabilities.
no code implementations • LREC 2020 • Mar{\'\i}a Jos{\'e} D{\'\i}az-Torres, Paulina Alej Mor{\'a}n-M{\'e}ndez, ra, Luis Villasenor-Pineda, Manuel Montes-y-G{\'o}mez, Juan Aguilera, Luis Meneses-Ler{\'\i}n
For this purpose, a Mexican Spanish Twitter corpus was compiled and analyzed.
no code implementations • LREC 2020 • Andrea Zaninello, Alex Birch, ra
Multiword Expressions (MWEs) are a frequently occurring phenomenon found in all natural languages that is of great importance to linguistic theory, natural language processing applications, and machine translation systems.
no code implementations • LREC 2020 • Francesca Bonin, Martin Gleize, Ailbhe Finnerty, C. Moore, ice, Charles Jochim, Emma Norris, Yufang Hou, Alison J. Wright, Debasis Ganguly, Emily Hayes, Silje Zink, Aless Pascale, ra, Pol Mac Aonghusa, Susan Michie
Due to the fast pace at which research reports in behaviour change are published, researchers, consultants and policymakers would benefit from more automatic ways to process these reports.
no code implementations • RANLP 2019 • Kenneth Steimel, Daniel Dakota, Yue Chen, S K{\"u}bler, ra
Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.
no code implementations • WS 2019 • Alex Birch, ra, Barry Haddow, Ivan Tito, Antonio Valerio Miceli Barone, Rachel Bawden, Felipe S{\'a}nchez-Mart{\'\i}nez, Mikel L. Forcada, Miquel Espl{\`a}-Gomis, V{\'\i}ctor S{\'a}nchez-Cartagena, Juan Antonio P{\'e}rez-Ortiz, Wilker Aziz, Andrew Secker, Peggy van der Kreeft
no code implementations • WS 2019 • Alfredo Gomez, Alicia Ngo, Aless Otondo, ra, Julie Medero
While affective outcomes are generally positive for the use of eBooks and computer-based reading tutors in teaching children to read, learning outcomes are often poorer (Korat and Shamir, 2004).
no code implementations • JEPTALNRECITAL 2019 • Philippe Suignard, Meryl Bothua, Alex Benamar, ra
Les m{\'e}thodes propos{\'e}es sont facilement transposables {\`a} d{'}autres t{\^a}ches d{'}indexation et de d{\'e}tection de similarit{\'e} qui peuvent int{\'e}resser plusieurs entit{\'e}s du groupe EDF.
no code implementations • JEPTALNRECITAL 2019 • Aleks Mileti{\'c}, ra, Delphine Bernhard, Myriam Bras, Anne-Laure Ligozat, Marianne Vergez-Couret
Cet article pr{\'e}sente un retour d{'}exp{\'e}rience sur la transformation de corpus annot{\'e}s pour l{'}alsacien et l{'}occitan vers le format CONLL-U d{\'e}fini dans le projet Universal Dependencies.
no code implementations • JEPTALNRECITAL 2019 • S Bellato, ra
Nous pr{\'e}sentons ici de premiers travaux abordant la question de r{\`e}gles de passage entre deux formalismes d{\'e}crivant la s{\'e}mantique d{'}adverbiaux temporels respectivement pour le fran{\c{c}}ais et pour la Langue des Signes Fran{\c{c}}aise (LSF).
no code implementations • NAACL 2019 • Massimo Poesio, Jon Chamberlain, Silviu Paun, Juntao Yu, Alex Uma, ra, Udo Kruschwitz
The corpus, containing annotations for about 108, 000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2. 2M in total.
no code implementations • WS 2019 • Kathleen C. Fraser, Nicklas Linz, Hali Lindsay, Alex K{\"o}nig, ra
Increased access to large datasets has driven progress in NLP.
no code implementations • WS 2019 • Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy Vu, Mohammad Zamani, Parth Limbachiya, Sharath Ch Guntuku, ra, H. Andrew Schwartz
Mental health predictive systems typically model language as if from a single context (e. g. Twitter posts, status updates, or forum posts) and often limited to a single level of analysis (e. g. either the message-level or user-level).
no code implementations • SEMEVAL 2019 • Waleed Ragheb, J{\'e}r{\^o}me Az{\'e}, S Bringay, ra, Maximilien Servajean
This paper addresses the problem of modeling textual conversations and detecting emotions.
no code implementations • WS 2019 • S Just, ra, Erik Haegert, Nora Ko{\v{r}}{\'a}nov{\'a}, Anna-Lena Br{\"o}cker, Ivan Nenchev, Jakob Funcke, Christiane Montag, Manfred Stede
Speech samples were obtained from healthy controls and patients with a diagnosis of schizophrenia or schizoaffective disorder and different severity of positive formal thought disorder.
no code implementations • NAACL 2019 • Kathleen C. Fraser, Nicklas Linz, Bai Li, Kristina Lundholm Fors, Frank Rudzicz, Alex K{\"o}nig, ra, Alex, Jan ersson, Philippe Robert, Dimitrios Kokkinakis
There is growing evidence that changes in speech and language may be early markers of dementia, but much of the previous NLP work in this area has been limited by the size of the available datasets.
no code implementations • SEMEVAL 2019 • Mario Graff, Mir, Sabino a-Jim{\'e}nez, Eric Tellez, Daniela Alej Ochoa, ra
This paper describes our participation in HatEval and OffensEval challenges for English and Spanish languages.
no code implementations • ALTA 2019 • Phuoc Nguyen, Alex Uitdenbogerd, ra
This study introduces a first approximation to readability of English text for VL1, with suggestions for further improvements.
no code implementations • ALTA 2019 • Patrick Jacob, Alex Uitdenbogerd, ra
Optimal language acquisition via reading requires the learners to read slightly above their current language skill level.
2 code implementations • WS 2018 • Francis Tyers, Mariya Sheyanova, Aleks Martynova, ra, Pavel Stepachev, Konstantin Vinogorodskiy
This paper describes a method of creating synthetic treebanks for cross-lingual dependency parsing using a combination of machine translation (including pivot translation), annotation projection and the spanning tree algorithm.
1 code implementation • CONLL 2018 • Tiberiu Boros, Stefan Daniel Dumitrescu, Rux Burtica, ra
We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.
no code implementations • COLING 2018 • Tiberiu Boros, Rux Burtica, ra
This paper addresses the issue of multi-word expression (MWE) detection by employing a new decoding strategy inspired after graph-based parsing.
1 code implementation • COLING 2018 • Lori Moon, Christos Christodoulopoulos, Cynthia Fisher, S. Franco, ra, Dan Roth
Inter-annotator agreement is given separately for prepositions and verbs, and for adult speech and child speech.
no code implementations • WS 2018 • Massimo Poesio, Yulia Grishina, Varada Kolhatkar, Nafise Moosavi, Ina Roesiger, Adam Roussel, Fabian Simonjetz, Alex Uma, ra, Olga Uryupina, Juntao Yu, Heike Zinsmeister
The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference).
no code implementations • WS 2018 • Francisco Rangel, Paolo Rosso, Julian Brooke, Alex Uitdenbogerd, ra
In this paper, we approach the task of native language identification in a realistic cross-corpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus.
no code implementations • WS 2018 • Sharath Ch Guntuku, ra, Salvatore Giorgi, Lyle Ungar
The goal of the shared task was to use childhood language as a marker for both current and future psychological health over individual lifetimes.
no code implementations • SEMEVAL 2018 • Ramona-Andreea Turcu, Amar, S ei, ra Maria, Iuliana-Alex Flescan-Lovin-Arseni, ra, Daniela Gifu, Tr, Diana abat
The „Affect in Tweets{''} task is centered on emotions categorization and evaluation matrix using multi-language tweets (English and Spanish).
no code implementations • NAACL 2018 • Alex Lavrentovich, ra
We focus on the use of English grammatical morphemes across four proficiency levels.
no code implementations • JEPTALNRECITAL 2018 • Waleed Mohamed Azmy, Bilel Moulahi, S Bringay, ra, Maximilien Servajean
Dans ce papier, nous d{\'e}crivons notre participation au d{\'e}fi d{'}analyse de texte DEFT 2018.
no code implementations • WS 2017 • Alex Balahur, ra
Emotions can be triggered by various factors.
no code implementations • RANLP 2017 • Daniel Dakota, S K{\"u}bler, ra
We investigate parsing replicability across 7 languages (and 8 treebanks), showing that choices concerning the use of grammatical functions in parsing or evaluation, the influence of the rare word threshold, as well as choices in test sentences and evaluation script options have considerable and often unexpected effects on parsing accuracies.
no code implementations • RANLP 2017 • Hai Hu, Daniel Dakota, S K{\"u}bler, ra
Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses.
no code implementations • WS 2017 • Charese Smiley, S K{\"u}bler, ra
In this paper, we discuss the results of the IUCL system in the NLI Shared Task 2017.
no code implementations • EMNLP 2017 • Alex Schofield, ra, Laure Thompson, David Mimno
Duplicate documents are a pervasive problem in text datasets and can have a strong effect on unsupervised models.
no code implementations • RANLP 2017 • Atreyee Mukherjee, S K{\"u}bler, ra
The results show that the choice of similarity metric has an effect on results and that we can reach comparable accuracies to the joint topic modeling in POS tagging and dependency parsing, thus providing a viable and efficient approach to POS tagging and parsing a sentence by its genre expert.
no code implementations • SEMEVAL 2017 • Iuliana Alex Fle{\textcommabelow{s}}can-Lovin-Arseni, ra, Ramona Andreea Turcu, Cristina S{\^\i}rbu, Larisa Alexa, Amar, S ei, ra Maria, Nichita Herciu, Constantin Scutaru, Tr, Diana ab{\u{a}}{\textcommabelow{t}}, Adrian Iftene
This paper presents the participation of {\#}WarTeam in Task 6 of SemEval2017 with a system classifying humor by comparing and ranking tweets.
no code implementations • SEMEVAL 2017 • Waleed Ammar, Matthew Peters, Ch Bhagavatula, ra, Russell Power
This paper describes our submission for the ScienceIE shared task (SemEval- 2017 Task 10) on entity and relation extraction from scientific papers.
no code implementations • ACL 2017 • Le Santos, ro, Edilson Anselmo Corr{\^e}a J{\'u}nior, Osvaldo Oliveira Jr, Diego Amancio, Let{\'\i}cia Mansur, S Alu{\'\i}sio, ra
The approach using linguistic features yielded higher accuracy if the transcriptions of the Cinderella dataset were manually revised.
no code implementations • JEPTALNRECITAL 2017 • Max Belign{\'e}, Aleks Campar, ra, Jean-Hugues Chauchat, Melanie Lefeuvre, Isabelle Lefort, Sabine Loudcher, Julien Velcin
Cet article s{'}int{\`e}gre dans un projet collaboratif qui vise {\`a} r{\'e}aliser une analyse longitudinale de la production universitaire en G{\'e}ographie.
no code implementations • EACL 2017 • Alex Schofield, ra, M{\aa}ns Magnusson, David Mimno
It is often assumed that topic models benefit from the use of a manually curated stopword list.
no code implementations • WS 2017 • Stefanie Dipper, S Waldenberger, ra
This paper investigates diatopic variation in a historical corpus of German.
no code implementations • EACL 2017 • Atreyee Mukherjee, S K{\"u}bler, ra, Matthias Scheutz
Part of speech (POS) taggers and dependency parsers tend to work well on homogeneous datasets but their performance suffers on datasets containing data from different genres.
no code implementations • WS 2017 • Lilian Wanzare, Aless Zarcone, ra, Stefan Thater, Manfred Pinkal
We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order.
no code implementations • EACL 2017 • Hans Uszkoreit, Aleks Gabryszak, ra, Leonhard Hennig, J{\"o}rg Steffen, Renlong Ai, Stephan Busemann, Jon Dehdari, Josef van Genabith, Georg Heigold, Nils Rethmeier, Raphael Rubino, Sven Schmeier, Philippe Thomas, He Wang, Feiyu Xu
Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking.
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • COLING 2016 • Adam Tsakalidis, Maria Liakata, Theo Damoulas, Brigitte Jellinek, Weisi Guo, Alex Cristea, ra
In this paper we address a new problem of predicting affect and well-being scales in a real-world setting of heterogeneous, longitudinal and non-synchronous textual as well as non-linguistic data that can be harvested from on-line media and mobile phones.
no code implementations • JEPTALNRECITAL 2016 • Aleks Miletic, ra, C{\'e}cile Fabre, Dejan Stosic
Cet article pr{\'e}sente une exp{\'e}rience d{'}annotation morphosyntaxique fine du volet serbe du corpus parall{\`e}le ParCoLab (corpus serbe-fran{\c{c}}ais-anglais).
no code implementations • LREC 2016 • S Collovini, ra, Gabriel Machado, Renata Vieira
The task of Relation Extraction from texts is one of the main challenges in the area of Information Extraction, considering the required linguistic knowledge and the sophistication of the language processing techniques employed.
no code implementations • LREC 2016 • G{\'e}raldine Damnati, Aleks Guerraz, ra, Delphine Charlet
In this article we propose a descriptive study of a chat conversations corpus from an assistance contact center.
no code implementations • LREC 2016 • Am{\'a}lia Mendes, S Antunes, ra, Maarten Janssen, Anabela Gon{\c{c}}alves
We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language.
no code implementations • LREC 2016 • Ranka Stankovi{\'c}, Cvetana Krstev, Ivan Obradovi{\'c}, Biljana Lazi{\'c}, Aleks Trtovac, ra
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms.
no code implementations • LREC 2016 • Lilian D. A. Wanzare, Aless Zarcone, ra, Stefan Thater, Manfred Pinkal
Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing).
no code implementations • LREC 2016 • Ev Fonseca, ro, Andr{\'e} Antonitsch, S Collovini, ra, Daniela Amaral, Renata Vieira, Anny Figueira
This paper presents Summ-it++, an enriched version the Summ-it corpus.
no code implementations • LREC 2016 • Alex Balahur, ra, Hristo Tanev
Emotions are an important part of the human experience.
no code implementations • LREC 2016 • Aleks Gabryszak, ra, Sebastian Krause, Leonhard Hennig, Feiyu Xu, Hans Uszkoreit
Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data.
no code implementations • TACL 2016 • Alex Schofield, ra, David Mimno
Rule-based stemmers such as the Porter stemmer are frequently used to preprocess English corpora for topic modeling.
no code implementations • JEPTALNRECITAL 2015 • C{\'e}dric Lopez, Aleks Ponomareva, ra, C{\'e}cile Robin, Andr{\'e} Bittar, Xabier Larrucea, Fr{\'e}d{\'e}rique Segond, Marie-H{\'e}l{\`e}ne Metzger
Le projet europ{\'e}en TIER (Integrated strategy for CBRN {--} Chemical, Biological, Radiological and Nuclear {--} Threat Identification and Emergency Response) vise {\`a} int{\'e}grer une strat{\'e}gie compl{\`e}te et int{\'e}gr{\'e}e pour la r{\'e}ponse d{'}urgence dans un contexte de dangers biologiques, chimiques, radiologiques, nucl{\'e}aires, ou li{\'e}s aux explosifs, bas{\'e}e sur l{'}identification des menaces et d{'}{\'e}valuation des risques.
no code implementations • JEPTALNRECITAL 2015 • Olivier Collin, Aleks Guerraz, ra
Pour ce faire, nous combinons deux approches : nous partons d{'}un syst{\`e}me {\`a} base de r{\`e}gles, qui pr{\'e}sente une bonne pr{\'e}cision, que nous couplons avec un mod{\`e}le de langage permettant d{'}augmenter le rappel.
no code implementations • JEPTALNRECITAL 2015 • G{\'e}raldine Damnati, Aleks Guerraz, ra, Delphine Charlet
L{'}{\'e}tude parall{\`e}le de transcriptions de conversations t{\'e}l{\'e}phoniques issues d{'}un centre d{'}appel dans le m{\^e}me domaine de l{'}assistance permet d{'}{\'e}tablir des comparaisons entre ces deux modes d{'}interaction.
no code implementations • LREC 2014 • Renlong Ai, Marcela Charfuelan, Walter Kasper, Tina Kl{\"u}wer, Hans Uszkoreit, Feiyu Xu, S Gasber, ra, Gien, Philip t
Modern language learning courses are no longer exclusively based on books or face-to-face lectures.
no code implementations • LREC 2014 • Wolfgang Maier, Miriam Kaeshammer, Peter Baumann, S K{\"u}bler, ra
However, for the evaluation of parser performance concerning a particular phenomenon, a test suite of sentences is needed in which this phenomenon has been identified.
no code implementations • LREC 2014 • Nathan Hartmann, Lucas Avan{\c{c}}o, Pedro Balage, Magali Duran, Maria das Gra{\c{c}}as Volpe Nunes, Thiago Pardo, S Alu{\'\i}sio, ra
Web 2. 0 has allowed a never imagined communication boom.
no code implementations • LREC 2014 • Alex Balahur, ra, Marco Turchi, Ralf Steinberger, Jose-Manuel Perea-Ortega, Guillaume Jacquet, Dilek K{\"u}{\c{c}}{\"u}k, Vanni Zavarella, Adil El Ghali
We show that the use of machine translated data obtained similar results as the use of native-speaker translations of the same data.
no code implementations • LREC 2014 • Antonio Balvet, Dejan Stosic, Aleks Miletic, ra
In this paper, we present a parallel literary corpus for Serbian, English and French, the TALC-sef corpus.
no code implementations • LREC 2014 • Timur Gilmanov, Olga Scrivner, S K{\"u}bler, ra
It is well known that word aligned parallel corpora are valuable linguistic resources.
no code implementations • LREC 2014 • S Antunes, ra, Am{\'a}lia Mendes
We report on an experiment to evaluate the role of statistical association measures and frequency for the identification of MWE.
no code implementations • LREC 2014 • Lianet Sep{\'u}lveda Torres, Magali Sanches Duran, S Alu{\'\i}sio, ra
Aiming to inform a module of a system designed to support scientific written production of Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing language transfer errors made by such foreign learners.
no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie