Search Results for author: Serge Sharoff

Found 37 papers, 3 papers with code

Applying Natural Annotation and Curriculum Learning to Named Entity Recognition for Under-Resourced Languages

1 code implementation • COLING 2022 • Valeriy Lobov, Alexandra Ivoylova, Serge Sharoff

In this study we test the possibility of (1) using natural annotation to build synthetic training sets from resources not initially designed for the target downstream task and (2) employing curriculum learning methods to select the most suitable examples from synthetic training sets.

Cross-Lingual Transfer Machine Translation +3

Paper
Code

BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation

no code implementations • LREC 2022 • Yuqian Dai, Marc de Kamps, Serge Sharoff

Pre-trained transformer-based models, such as BERT, have shown excellent performance in most natural language processing benchmark tests, but we still lack a good understanding of the linguistic knowledge of BERT in Neural Machine Translation (NMT).

Machine Translation NMT +1

Paper
Add Code

Estimating Confidence of Predictions of Individual Classifiers and TheirEnsembles for the Genre Classification Task

no code implementations • LREC 2022 • Mikhail Lepekhin, Serge Sharoff

Genre identification is a kind of non-topic text classification.

Genre classification text-classification +2

Paper
Add Code

BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification

1 code implementation • 27 Nov 2023 • Dmitri Roussinov, Serge Sharoff

While performance of many text classification tasks has been recently improved due to Pre-trained Language Models (PLMs), in this paper we show that they still suffer from a performance gap when the underlying distribution of topics changes.

Genre classification Sentiment Analysis +3

Paper
Code

Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation

no code implementations • 18 Nov 2023 • Nurbanu Aksoy, Serge Sharoff, Selcuk Baser, Nishant Ravikumar, Alejandro F Frangi

Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.

Paper
Add Code

GATology for Linguistics: What Syntactic Dependencies It Knows

no code implementations • 22 May 2023 • Yuqian Dai, Serge Sharoff, Marc de Kamps

Moreover, GAT is more competitive in training speed and syntactic dependency prediction than MT-B, which may reveal a better incorporation of modeling explicit syntactic knowledge and the possibility of combining GAT and BERT in the MT tasks.

Graph Attention Machine Translation

Paper
Add Code

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

no code implementations • 22 May 2023 • Yuqian Dai, Serge Sharoff, Marc de Kamps

Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled.

Graph Attention Machine Translation +2

Paper
Add Code

Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

no code implementations • 15 Jun 2022 • Mikhail Lepekhin, Serge Sharoff

We can evaluate robustness via the confidence gap between the correctly classified texts and the misclassified ones on a labeled test corpus, higher gaps make it easier to improve our confidence that our classifier made the right decision.

Genre classification text-classification +1

Paper
Add Code

Towards Arabic Sentence Simplification via Classification and Generative Approaches

no code implementations • 20 Apr 2022 • Nouran Khallaf, Serge Sharoff

This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system.

Classification Lexical Simplification +2

Paper
Add Code

Experiments with adversarial attacks on text genres

no code implementations • 5 Jul 2021 • Mikhail Lepekhin, Serge Sharoff

Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification, such as genre identification.

Paper
Add Code

Automatic Difficulty Classification of Arabic Sentences

no code implementations • EACL (WANLP) 2021 • Nouran Khallaf, Serge Sharoff

In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex.

Binary Classification Classification +7

Paper
Add Code

Overview of the Fourth BUCC Shared Task: Bilingual Dictionary Induction from Comparable Corpora

no code implementations • LREC 2020 • Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff

The shared task of the 13th Workshop on Building and Using Comparable Corpora was devoted to the induction of bilingual dictionaries from comparable rather than parallel corpora.

Paper
Add Code

Recognizing Semantic Relations by Combining Transformers and Fully Connected Models

no code implementations • LREC 2020 • Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina

Current approaches to automatically telling if a relation exists between two given concepts X and Y can be grouped into two types: 1) those modeling word-paths connecting X and Y in text and 2) those modeling distributional properties of X and Y separately, not necessary in the proximity to each other.

Language Modelling Relation

Paper
Add Code

Know thy corpus! Robust methods for digital curation of Web corpora

1 code implementation • LREC 2020 • Serge Sharoff

This paper proposes a novel framework for digital curation of Web corpora in order to provide robust estimation of their parameters, such as their composition and the lexicon.

Genre classification Topic Models

Paper
Code

Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks

no code implementations • LREC 2020 • Yu Yuan, Serge Sharoff

This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations.

Feature Engineering Sentence +1

Paper
Add Code

Towards Functionally Similar Corpus Resources for Translation

no code implementations • RANLP 2019 • Maria Kunilovskaya, Serge Sharoff

We exploit a text-external approach, based on a set of Functional Text Dimensions to model text functions, so that each text can be represented as a vector in a multidimensional space of text functions.

Translation

Paper
Add Code

Investigating the Influence of Bilingual MWU on Trainee Translation Quality

no code implementations • LREC 2018 • Yu Yuan, Serge Sharoff

Machine Translation Translation +1

Paper
Add Code

Language adaptation experiments via cross-lingual embeddings for related languages

no code implementations • LREC 2018 • Serge Sharoff

Domain Adaptation Information Retrieval +4

Paper
Add Code

Cross-lingual Terminology Extraction for Translation Quality Estimation

no code implementations • LREC 2018 • Yu Yuan, Yuze Gao, Yue Zhang, Serge Sharoff

Machine Translation Translation

Paper
Add Code

A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora

no code implementations • LREC 2018 • Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp

Machine Translation Semantic Textual Similarity +1

Paper
Add Code

Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora

no code implementations • WS 2017 • Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp

We examined manually a small sample of the false negative sentence pairs for the most precise French-English runs and estimated the number of parallel sentence pairs not yet in the provided gold standard.

Machine Translation Sentence

Paper
Add Code

Toward Pan-Slavic NLP: Some Experiments with Language Adaptation

no code implementations • WS 2017 • Serge Sharoff

In this talk I will discuss a general approach, which can be called Language Adaptation, similarly to Domain Adaptation.

Domain Adaptation Language Modelling +7

Paper
Add Code

Genre classification for a corpus of academic webpages

no code implementations • WS 2016 • Erika Dalan, Serge Sharoff

Classification General Classification +1

Paper
Add Code

MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment

no code implementations • LREC 2016 • Yu Yuan, Serge Sharoff, Bogdan Babych

We compare MoBiL with the QuEst baseline set by using them in classifiers trained with support vector machine and relevance vector machine learning algorithms on the same data set.

feature selection Language Modelling +1