no code implementations • LREC 2022 • Yuqian Dai, Marc de Kamps, Serge Sharoff
Pre-trained transformer-based models, such as BERT, have shown excellent performance in most natural language processing benchmark tests, but we still lack a good understanding of the linguistic knowledge of BERT in Neural Machine Translation (NMT).
no code implementations • LREC 2022 • Mikhail Lepekhin, Serge Sharoff
Genre identification is a kind of non-topic text classification.
1 code implementation • COLING 2022 • Valeriy Lobov, Alexandra Ivoylova, Serge Sharoff
In this study we test the possibility of (1) using natural annotation to build synthetic training sets from resources not initially designed for the target downstream task and (2) employing curriculum learning methods to select the most suitable examples from synthetic training sets.
1 code implementation • 27 Nov 2023 • Dmitri Roussinov, Serge Sharoff
While performance of many text classification tasks has been recently improved due to Pre-trained Language Models (PLMs), in this paper we show that they still suffer from a performance gap when the underlying distribution of topics changes.
no code implementations • 18 Nov 2023 • Nurbanu Aksoy, Serge Sharoff, Selcuk Baser, Nishant Ravikumar, Alejandro F Frangi
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
no code implementations • 22 May 2023 • Yuqian Dai, Serge Sharoff, Marc de Kamps
Moreover, GAT is more competitive in training speed and syntactic dependency prediction than MT-B, which may reveal a better incorporation of modeling explicit syntactic knowledge and the possibility of combining GAT and BERT in the MT tasks.
no code implementations • 22 May 2023 • Yuqian Dai, Serge Sharoff, Marc de Kamps
Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled.
no code implementations • 15 Jun 2022 • Mikhail Lepekhin, Serge Sharoff
We can evaluate robustness via the confidence gap between the correctly classified texts and the misclassified ones on a labeled test corpus, higher gaps make it easier to improve our confidence that our classifier made the right decision.
no code implementations • 20 Apr 2022 • Nouran Khallaf, Serge Sharoff
This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system.
no code implementations • 5 Jul 2021 • Mikhail Lepekhin, Serge Sharoff
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification, such as genre identification.
no code implementations • EACL (WANLP) 2021 • Nouran Khallaf, Serge Sharoff
In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex.
no code implementations • LREC 2020 • Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina
Current approaches to automatically telling if a relation exists between two given concepts X and Y can be grouped into two types: 1) those modeling word-paths connecting X and Y in text and 2) those modeling distributional properties of X and Y separately, not necessary in the proximity to each other.
no code implementations • LREC 2020 • Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
The shared task of the 13th Workshop on Building and Using Comparable Corpora was devoted to the induction of bilingual dictionaries from comparable rather than parallel corpora.
no code implementations • LREC 2020 • Yu Yuan, Serge Sharoff
This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations.
1 code implementation • LREC 2020 • Serge Sharoff
This paper proposes a novel framework for digital curation of Web corpora in order to provide robust estimation of their parameters, such as their composition and the lexicon.
no code implementations • RANLP 2019 • Maria Kunilovskaya, Serge Sharoff
We exploit a text-external approach, based on a set of Functional Text Dimensions to model text functions, so that each text can be represented as a vector in a multidimensional space of text functions.
no code implementations • WS 2017 • Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp
We examined manually a small sample of the false negative sentence pairs for the most precise French-English runs and estimated the number of parallel sentence pairs not yet in the provided gold standard.
no code implementations • WS 2017 • Serge Sharoff
In this talk I will discuss a general approach, which can be called Language Adaptation, similarly to Domain Adaptation.
no code implementations • LREC 2016 • Yu Yuan, Serge Sharoff, Bogdan Babych
We compare MoBiL with the QuEst baseline set by using them in classifiers trained with support vector machine and relevance vector machine learning algorithms on the same data set.
no code implementations • LREC 2014 • Noushin Rezapour Asheghi, Serge Sharoff, Katja Markert
In this paper, we present the first web genre corpus which is reliably annotated.
no code implementations • LREC 2012 • Reinhard Rapp, Serge Sharoff, Bogdan Babych
The extraction of dictionaries from parallel text corpora is an established technique.