Search Results for author: Andrey Kutuzov

Found 47 papers, 18 papers with code

Multilingual ELMo and the Effects of Corpus Sampling

no code implementations • NoDaLiDa 2021 • Vinit Ravishankar, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal

Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages.

Paper
Add Code

SemEval 2022 Task 10: Structured Sentiment Analysis

no code implementations • SemEval (NAACL) 2022 • Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity.

Sentiment Analysis

Paper
Add Code

Enriching Word Usage Graphs with Cluster Definitions

1 code implementation • 26 Mar 2024 • Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev, Dominik Schlechtweg

We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions.

Paper
Code

A New Massive Multilingual Dataset for High-Performance Language Technologies

no code implementations • 20 Mar 2024 • Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.

Language Modelling Machine Translation +2

Paper
Add Code

Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

1 code implementation • 16 Sep 2023 • Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, Andrey Kutuzov, Barry Haddow, Kenneth Heafield

Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants.

Instruction Following Large Language Model +3

Paper
Code

Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis

1 code implementation • 19 May 2023 • Mario Giulianelli, Iris Luden, Raquel Fernandez, Andrey Kutuzov

We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.

Language Modelling Semantic Similarity +3

Paper
Code

NorBench -- A Benchmark for Norwegian Language Models

1 code implementation • 6 May 2023 • David Samuel, Andrey Kutuzov, Samia Touileb, Erik Velldal, Lilja Øvrelid, Egil Rønningstad, Elina Sigdel, Anna Palatkina

We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics.

Paper
Code

Trained on 100 million words and still in shape: BERT meets British National Corpus

2 code implementations • 17 Mar 2023 • David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal

While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus.

Language Modelling

Paper
Code

RuDSI: graph-based word sense induction dataset for Russian

2 code implementations • COLING (TextGraphs) 2022 • Anna Aksenova, Ekaterina Gavrishina, Elisey Rykov, Andrey Kutuzov

We present RuDSI, a new benchmark for word sense induction (WSI) in Russian.

Clustering Graph Clustering +1

Paper
Code

Contextualized language models for semantic change detection: lessons learned

1 code implementation • 31 Aug 2022 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift in the lexicographic sense of the term (or at least the status of these shifts is questionable).

Change Detection

Paper
Code

Do Not Fire the Linguist: Grammatical Profiles Help Language Models Detect Semantic Change

no code implementations • LChange (ACL) 2022 • Mario Giulianelli, Andrey Kutuzov, Lidia Pivovarova

In this work, we explore whether large pre-trained contextualised language models, a common tool for lexical semantic change detection, are sensitive to such morphosyntactic changes.

Change Detection XLM-R

Paper
Add Code

NorDiaChange: Diachronic Semantic Change Dataset for Norwegian

1 code implementation • LREC 2022 • Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Ranveig Enstad, Alexandra Wittemann

We describe NorDiaChange: the first diachronic semantic change dataset for Norwegian.

Paper
Code

Grammatical Profiling for Semantic Change Detection

1 code implementation • CoNLL (EMNLP) 2021 • Mario Giulianelli, Andrey Kutuzov, Lidia Pivovarova

Semantics, morphology and syntax are strongly interdependent.

Change Detection

Paper
Code

Three-part diachronic semantic change dataset for Russian

no code implementations • ACL (LChange) 2021 • Andrey Kutuzov, Lidia Pivovarova

We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval.

Change Detection

Paper
Add Code

Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks

no code implementations • 3 May 2021 • Tatyana Iazykova, Denis Kapelyushnik, Olga Bystrova, Andrey Kutuzov

Often approaches based on simple rules outperform or come close to the results of the notorious pre-trained language models like GPT-3 or BERT.

Ranked #3 on Common Sense Reasoning on RWSD

Common Sense Reasoning Natural Language Inference +4

Paper
Add Code

Large-Scale Contextualised Language Modelling for Norwegian

2 code implementations • NoDaLiDa 2021 • Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training.

Language Modelling

Paper
Code

Representing ELMo embeddings as two-dimensional text online

no code implementations • EACL 2021 • Andrey Kutuzov, Elizaveta Kuzmenko

We describe a new addition to the WebVectors toolkit which is used to serve word embedding models over the Web.

Sentence Vocal Bursts Valence Prediction

Paper
Add Code

RuSemShift: a dataset of historical lexical semantic change in Russian

no code implementations • COLING 2020 • Julia Rodina, Andrey Kutuzov

We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times.

Sentence

Paper
Add Code

ELMo and BERT in semantic change detection for Russian

no code implementations • 7 Oct 2020 • Julia Rodina, Yuliya Trofimova, Andrey Kutuzov, Ekaterina Artemova

We study the effectiveness of contextualized embeddings for the task of diachronic semantic change detection for Russian language data.

Change Detection

Paper
Add Code

UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection

1 code implementation • SEMEVAL 2020 • Andrey Kutuzov, Mario Giulianelli

We apply contextualised word embeddings to lexical semantic change detection in the SemEval-2020 Shared Task 1.

Change Detection Word Embeddings

Paper
Code

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

no code implementations • LREC 2020 • Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.

Word Embeddings Word Sense Disambiguation

Paper
Add Code

\'UFAL-Oslo at MRP 2019: Garage Sale Semantic Parsing

no code implementations • CONLL 2019 • Kira Droganova, Andrey Kutuzov, Nikita Mediankin, Daniel Zeman

This paper describes the {\'U}FAL--Oslo system submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP, Oepen et al. 2019).

Semantic Parsing

Paper
Add Code

To lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation

no code implementations • WS 2019 • Andrey Kutuzov, Elizaveta Kuzmenko

Then, these models were evaluated on the word sense disambiguation task.

Lemmatization Word Sense Disambiguation

Paper
Add Code

Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian

no code implementations • WS 2019 • Julia Rodina, Baksh, Daria aeva, Vadim Fomin, Andrey Kutuzov, Samia Touileb, Erik Velldal

We measure the intensity of diachronic semantic shifts in adjectives in English, Norwegian and Russian across 5 decades.

Word Embeddings

Paper
Add Code

One-to-X analogical reasoning on word embeddings: a case for diachronic armed conflict prediction from news texts

1 code implementation • WS 2019 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists.

Word Embeddings

Paper
Code

Making Fast Graph-based Algorithms with Graph Metric Embeddings

1 code implementation • ACL 2019 • Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko

The computation of distance measures between nodes in graphs is inefficient and does not scale to large graphs.

Paper
Code

Tracing cultural diachronic semantic shifts in Russian using word embeddings: test sets and baselines

1 code implementation • 16 May 2019 • Vadim Fomin, Daria Bakshandaeva, Julia Rodina, Andrey Kutuzov

The paper introduces manually annotated test sets for the task of tracing diachronic (temporal) semantic shifts in Russian.

General Classification Word Embeddings

Paper
Code

Learning Graph Embeddings from WordNet-based Similarity Measures

no code implementations • SEMEVAL 2019 • Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko

We present path2vec, a new approach for learning graph embeddings that relies on structural measures of pairwise node similarities.

Graph Embedding Semantic Similarity +2

Paper
Add Code

Diachronic word embeddings and semantic shifts: a survey

no code implementations • COLING 2018 • Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, Erik Velldal

Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models.

Diachronic Word Embeddings Word Embeddings

Paper
Add Code

Russian word sense induction by clustering averaged word embeddings

1 code implementation • 6 May 2018 • Andrey Kutuzov

The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018).

Clustering Word Embeddings +1

Paper
Code

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

1 code implementation • 19 Jan 2018 • Andrey Kutuzov, Maria Kunilovskaya

Aside from the already known fact that the RNC is generally a better training corpus than web corpora, we enumerate and explain fine differences in how the models process semantic similarity task, what parts of the evaluation set are difficult for particular models and why.

166

Paper
Code

Tracing armed conflicts with diachronic word embedding models

no code implementations • WS 2017 • Andrey Kutuzov, Erik Velldal, Lilja {\O}vrelid

Recent studies have shown that word embedding models can be used to trace time-related (diachronic) semantic shifts in particular words.

Word Embeddings

Paper
Add Code

Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants

no code implementations • EMNLP 2017 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words.

Word Embeddings

Paper
Add Code

Word vectors, reuse, and replicability: Towards a community repository of large-text resources

no code implementations • WS 2017 • Murhaf Fares, Andrey Kutuzov, Stephan Oepen, Erik Velldal

Semantic Textual Similarity Word Embeddings

Paper
Add Code

Redefining Context Windows for Word Embedding Models: An Experimental Study

no code implementations • WS 2017 • Pierre Lison, Andrey Kutuzov

Distributional semantic models learn vector representations of words through the contexts they occur in.

Paper
Add Code

Clustering of Russian Adjective-Noun Constructions using Word Embeddings

no code implementations • WS 2017 • Andrey Kutuzov, Elizaveta Kuzmenko, Lidia Pivovarova

This paper presents a method of automatic construction extraction from a large corpus of Russian.

Clustering Word Embeddings

Paper
Add Code

Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit

no code implementations • EACL 2017 • Andrey Kutuzov, Elizaveta Kuzmenko

In this demo we present WebVectors, a free and open-source toolkit helping to deploy web services which demonstrate and visualize distributional semantic models (widely known as word embeddings).

Machine Translation Named Entity Recognition (NER) +2

Paper
Add Code

Universal Dependencies-based syntactic features in detecting human translation varieties

no code implementations • WS 2017 • Maria Kunilovskaya, Andrey Kutuzov

Machine Translation Text Classification +1

Paper
Add Code

Exploration of register-dependent lexical semantics using word embeddings

1 code implementation • WS 2016 • Andrey Kutuzov, Elizaveta Kuzmenko, Anna Marakasova

We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm.

General Classification regression +1

Paper
Code

Redefining part-of-speech classes with distributional semantic models

no code implementations • CONLL 2016 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries.

POS TAG +1

Paper
Add Code

Neural Embedding Language Models in Semantic Clustering of Web Search Results

no code implementations • LREC 2016 • Andrey Kutuzov, Elizaveta Kuzmenko

In this paper, a new approach towards semantic clustering of the results of ambiguous search queries is presented.

Clustering

Paper
Add Code

Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints

no code implementations • 18 Apr 2016 • Andrey Kutuzov, Mikhail Kopotev, Tatyana Sviridenko, Lyubov Ivanova

We present our experience in applying distributional semantics (neural word embeddings) to the problem of representing and clustering documents in a bilingual comparable corpus.

Clustering Translation +1