no code implementations • 22 Oct 2023 • Anthi Papadopoulou, Pierre Lison, Mark Anderson, Lilja Øvrelid, Ildikó Pilán
The text sanitization process starts with a privacy-oriented entity recognizer that seeks to determine the text spans expressing identifiable personal information.
1 code implementation • 27 Sep 2023 • Ildikó Pilán, Laurent Prévot, Hendrik Buschmeier, Pierre Lison
Scripted dialogues such as movie and TV subtitles constitute a widespread source of training data for conversational NLP models.
1 code implementation • LREC 2022 • Anthi Papadopoulou, Pierre Lison, Lilja Øvrelid, Ildikó Pilán
Instead of requiring manually labeled training data, the approach relies on a knowledge graph expressing the background information assumed to be publicly available about various individuals.
2 code implementations • 25 Jan 2022 • Ildikó Pilán, Pierre Lison, Lilja Øvrelid, Anthi Papadopoulou, David Sánchez, Montserrat Batet
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods.
no code implementations • 6 Apr 2020 • Ildikó Pilán, Pål H. Brekke, Lilja Øvrelid
We present a large Norwegian lexical resource of categorized medical terms.
no code implementations • 12 Jun 2017 • Ildikó Pilán, Elena Volodina, Lars Borin
We present a framework and its implementation relying on Natural Language Processing methods, which aims at the identification of exercise item candidates from corpora.
no code implementations • 6 May 2016 • Ildikó Pilán
We explore the factors influencing the dependence of single sentences on their larger textual context in order to automatically identify candidate sentences for language learning exercises from corpora which are presentable in isolation.
1 code implementation • LREC 2016 • Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell
Inter-rater agreement is presented on the basis of SW1203 subcorpus.
no code implementations • 29 Mar 2016 • Ildikó Pilán, Sowmya Vajjala, Elena Volodina
Corpora and web texts can become a rich language learning resource if we have a means of assessing whether they are linguistically appropriate for learners at a given proficiency level.