1 code implementation • LREC 2022 • Yves Bestgen
This paper argues for the widest possible use of bootstrap confidence intervals for comparing NLP system performances instead of the state-of-the-art status (SOTA) and statistical significance testing.
no code implementations • EACL (VarDial) 2021 • Yves Bestgen
This paper describes the system developed by the Laboratoire d’analyse statistique des textes for the Dravidian Language Identification (DLI) shared task of VarDial 2021.
no code implementations • TRITON 2021 • Yves Bestgen
A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences (FSs), and more high-frequency FSs.
no code implementations • gwll (LREC) 2022 • Yves Bestgen
To produce new bilingual dictionaries from existing ones, an important task in the field of translation, a system based on a very classical supervised learning technique, with no other knowledge than the available bilingual dictionaries, is proposed.
no code implementations • 10 Jul 2023 • Yves Bestgen
The impact of text length on the estimation of lexical diversity has captured the attention of the scientific community for more than a century.
no code implementations • ParlaCLARIN (LREC) 2022 • Yves Bestgen
A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words.
2 code implementations • 23 May 2022 • Yves Bestgen
This paper argues for the widest possible use of bootstrap confidence intervals for comparing NLP system performances instead of the state-of-the-art status (SOTA) and statistical significance testing.
no code implementations • SemEval (NAACL) 2022 • Yves Bestgen
A logistic regression model only fed with character and word n-grams is proposed for the SemEval-2022 Task 4 on Patronizing and Condescending Language Detection (PCL).
no code implementations • 5 Feb 2022 • Yves Bestgen
For automatically identifying hate speech and offensive content in tweets, a system based on a classical supervised algorithm only fed with character n-grams, and thus completely language-agnostic, is proposed by the SATLab team.
no code implementations • 8 Jul 2021 • Yves Bestgen
A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences, and more high-frequency formulaic sequences.
no code implementations • SEMEVAL 2021 • Yves Bestgen
This paper describes the system developed by the Laboratoire d'analyse statistique des textes (LAST) for the Lexical Complexity Prediction shared task at SemEval-2021.
no code implementations • 29 Apr 2021 • Yves Bestgen
To determine whether some often-used lexical association measures assign high scores to n-grams that chance could have produced as frequently as observed, we used an extension of Fisher's exact test to sequences longer than two words to analyse a corpus of four million words.
no code implementations • NAACL (CMCL) 2021 • Yves Bestgen
A LightGBM model fed with target word lexical characteristics and features obtained from word frequency lists, psychometric data and bigram association measures has been optimized for the 2021 CMCL Shared Task on Eye-Tracking Data Prediction.
no code implementations • SEMEVAL 2020 • Yves Bestgen
To select tokens to be emphasised in short texts, a system mainly based on precomputed embedding models, such as BERT and ELMo, and LightGBM is proposed.
no code implementations • LREC 2020 • Yves Bestgen
his study aims to reproduce the research of Vajjala and Rama (2018) which showed that it is possible to predict the quality of a text written by learners of a given language by means of a model built on the basis of texts written by learners of another language.
no code implementations • SEMEVAL 2019 • Yves Bestgen
This paper describes the system developed by the Centre for English Corpus Linguistics for the SemEval-2019 Task 3: EmoContext.
no code implementations • SEMEVAL 2019 • Yves Bestgen
Tintin, the system proposed by the CECL for the Hyperpartisan News Detection task of SemEval 2019, is exclusively based on the tokens that make up the documents and a standard supervised learning procedure.
no code implementations • WS 2018 • Yves Bestgen
This paper describes the system developed by the Centre for English Corpus Linguistics for the 2018 Duolingo SLAM challenge.
no code implementations • JEPTALNRECITAL 2017 • Yves Bestgen
Cette recherche a pour principal objectif d{'}{\'e}valuer l{'}utilit{\'e} de prendre en compte des mesures totalement automatiques de la comp{\'e}tence phras{\'e}ologique pour estimer la qualit{\'e} de textes d{'}apprenants de l{'}anglais langue {\'e}trang{\`e}re.
no code implementations • JEPTALNRECITAL 2017 • Yves Bestgen
Pour d{\'e}terminer si certaines mesures d{'}association lexicale fr{\'e}quemment employ{\'e}es en TAL attribuent des scores {\'e}lev{\'e}s {\`a} des n-grammes que le hasard aurait pu produire aussi souvent qu{'}observ{\'e}, nous avons utilis{\'e} une extension du test exact de Fisher {\`a} des s{\'e}quences de plus de deux mots.
no code implementations • WS 2017 • Yves Bestgen
This paper describes the system developed by the Centre for English Corpus Linguistics (CECL) to discriminating similar languages, language varieties and dialects.
no code implementations • JEPTALNRECITAL 2016 • Marie-Aude Lefer, Yves Bestgen, Natalia Grabar
Ensuite, pour chaque longueur, les 1 000 n-grammes les plus fr{\'e}quents dans chaque langue sont trait{\'e}s par l{'}AFC pour d{\'e}terminer quels n-grammes sont particuli{\`e}rement saillants dans les genres {\'e}tudi{\'e}s. Enfin, les n-grammes sont cat{\'e}goris{\'e}s manuellement en distinguant les expressions d{'}opinion et de certitude, les marqueurs discursifs et les expressions r{\'e}f{\'e}rentielles.