Search Results for author: Yves Bestgen

Found 25 papers, 2 papers with code

Please, Don’t Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status

1 code implementation LREC 2022 Yves Bestgen

This paper argues for the widest possible use of bootstrap confidence intervals for comparing NLP system performances instead of the state-of-the-art status (SOTA) and statistical significance testing.

Optimizing a Supervised Classifier for a Difficult Language Identification Problem

no code implementations EACL (VarDial) 2021 Yves Bestgen

This paper describes the system developed by the Laboratoire d’analyse statistique des textes for the Dravidian Language Identification (DLI) shared task of VarDial 2021.

Language Identification regression

Using CollGram to Compare Formulaic Language in Human and Machine Translation

no code implementations TRITON 2021 Yves Bestgen

A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences (FSs), and more high-frequency FSs.

Machine Translation Translation

Creating Bilingual Dictionaries from Existing Ones by Means of Pivot-Oriented Translation Inference and Logistic Regression

no code implementations gwll (LREC) 2022 Yves Bestgen

To produce new bilingual dictionaries from existing ones, an important task in the field of translation, a system based on a very classical supervised learning technique, with no other knowledge than the available bilingual dictionaries, is proposed.

Translation

Measuring Lexical Diversity in Texts: The Twofold Length Problem

no code implementations10 Jul 2023 Yves Bestgen

The impact of text length on the estimation of lexical diversity has captured the attention of the scientific community for more than a century.

Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus

no code implementations ParlaCLARIN (LREC) 2022 Yves Bestgen

A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words.

Machine Translation Translation

Please, Don't Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status

2 code implementations23 May 2022 Yves Bestgen

This paper argues for the widest possible use of bootstrap confidence intervals for comparing NLP system performances instead of the state-of-the-art status (SOTA) and statistical significance testing.

SATLab at SemEval-2022 Task 4: Trying to Detect Patronizing and Condescending Language with only Character and Word N-grams

no code implementations SemEval (NAACL) 2022 Yves Bestgen

A logistic regression model only fed with character and word n-grams is proposed for the SemEval-2022 Task 4 on Patronizing and Condescending Language Detection (PCL).

regression

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

no code implementations5 Feb 2022 Yves Bestgen

For automatically identifying hate speech and offensive content in tweets, a system based on a classical supervised algorithm only fed with character n-grams, and thus completely language-agnostic, is proposed by the SATLab team.

Using CollGram to Compare Formulaic Language in Human and Neural Machine Translation

no code implementations8 Jul 2021 Yves Bestgen

A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences, and more high-frequency formulaic sequences.

Machine Translation Translation

LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction Using Bigram Association Measures

no code implementations SEMEVAL 2021 Yves Bestgen

This paper describes the system developed by the Laboratoire d'analyse statistique des textes (LAST) for the Lexical Complexity Prediction shared task at SemEval-2021.

Lexical Complexity Prediction Sentence +1

Using Fisher's Exact Test to Evaluate Association Measures for N-grams

no code implementations29 Apr 2021 Yves Bestgen

To determine whether some often-used lexical association measures assign high scores to n-grams that chance could have produced as frequently as observed, we used an extension of Fisher's exact test to sequences longer than two words to analyse a corpus of four million words.

LAST at CMCL 2021 Shared Task: Predicting Gaze Data During Reading with a Gradient Boosting Decision Tree Approach

no code implementations NAACL (CMCL) 2021 Yves Bestgen

A LightGBM model fed with target word lexical characteristics and features obtained from word frequency lists, psychometric data and bigram association measures has been optimized for the 2021 CMCL Shared Task on Eye-Tracking Data Prediction.

LAST at SemEval-2020 Task 10: Finding Tokens to Emphasise in Short Written Texts with Precomputed Embedding Models and LightGBM

no code implementations SEMEVAL 2020 Yves Bestgen

To select tokens to be emphasised in short texts, a system mainly based on precomputed embedding models, such as BERT and ELMo, and LightGBM is proposed.

Reproducing Monolingual, Multilingual and Cross-Lingual CEFR Predictions

no code implementations LREC 2020 Yves Bestgen

his study aims to reproduce the research of Vajjala and Rama (2018) which showed that it is possible to predict the quality of a text written by learners of a given language by means of a model built on the basis of texts written by learners of another language.

CECL at SemEval-2019 Task 3: Using Surface Learning for Detecting Emotion in Textual Conversations

no code implementations SEMEVAL 2019 Yves Bestgen

This paper describes the system developed by the Centre for English Corpus Linguistics for the SemEval-2019 Task 3: EmoContext.

Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens

no code implementations SEMEVAL 2019 Yves Bestgen

Tintin, the system proposed by the CECL for the Hyperpartisan News Detection task of SemEval 2019, is exclusively based on the tokens that make up the documents and a standard supervised learning procedure.

Predicting Second Language Learner Successes and Mistakes by Means of Conjunctive Features

no code implementations WS 2018 Yves Bestgen

This paper describes the system developed by the Centre for English Corpus Linguistics for the 2018 Duolingo SLAM challenge.

Language Acquisition

Utilisation d'indices phras\'eologiques pour \'evaluer des textes en langue \'etrang\`ere : comparaison des bigrammes et des trigrammes (Collocation measures and automated scoring of foreign language texts : Comparing bigrams and trigrams)

no code implementations JEPTALNRECITAL 2017 Yves Bestgen

Cette recherche a pour principal objectif d{'}{\'e}valuer l{'}utilit{\'e} de prendre en compte des mesures totalement automatiques de la comp{\'e}tence phras{\'e}ologique pour estimer la qualit{\'e} de textes d{'}apprenants de l{'}anglais langue {\'e}trang{\`e}re.

SENTS

\'Evaluation de mesures d'association pour les bigrammes et les trigrammes au moyen du test exact de Fisher (Using Fisher's Exact Test to Evaluate Association Measures for Bigrams and Trigrams)

no code implementations JEPTALNRECITAL 2017 Yves Bestgen

Pour d{\'e}terminer si certaines mesures d{'}association lexicale fr{\'e}quemment employ{\'e}es en TAL attribuent des scores {\'e}lev{\'e}s {\`a} des n-grammes que le hasard aurait pu produire aussi souvent qu{'}observ{\'e}, nous avons utilis{\'e} une extension du test exact de Fisher {\`a} des s{\'e}quences de plus de deux mots.

Improving the Character Ngram Model for the DSL Task with BM25 Weighting and Less Frequently Used Feature Sets

no code implementations WS 2017 Yves Bestgen

This paper describes the system developed by the Centre for English Corpus Linguistics (CECL) to discriminating similar languages, language varieties and dialects.

Dialect Identification

Vers une analyse des diff\'erences interlinguistiques entre les genres textuels : \'etude de cas bas\'ee sur les n-grammes et l'analyse factorielle des correspondances (Towards a cross-linguistic analysis of genres: A case study based on n-grams and Correspondence Analysis)

no code implementations JEPTALNRECITAL 2016 Marie-Aude Lefer, Yves Bestgen, Natalia Grabar

Ensuite, pour chaque longueur, les 1 000 n-grammes les plus fr{\'e}quents dans chaque langue sont trait{\'e}s par l{'}AFC pour d{\'e}terminer quels n-grammes sont particuli{\`e}rement saillants dans les genres {\'e}tudi{\'e}s. Enfin, les n-grammes sont cat{\'e}goris{\'e}s manuellement en distinguant les expressions d{'}opinion et de certitude, les marqueurs discursifs et les expressions r{\'e}f{\'e}rentielles.

Cannot find the paper you are looking for? You can Submit a new open access paper.