Search Results for author: Andrey Kutuzov

Found 47 papers, 18 papers with code

Multilingual ELMo and the Effects of Corpus Sampling

no code implementations NoDaLiDa 2021 Vinit Ravishankar, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal

Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages.

SemEval 2022 Task 10: Structured Sentiment Analysis

no code implementations SemEval (NAACL) 2022 Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity.

Sentiment Analysis

Enriching Word Usage Graphs with Cluster Definitions

1 code implementation26 Mar 2024 Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev, Dominik Schlechtweg

We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions.

A New Massive Multilingual Dataset for High-Performance Language Technologies

no code implementations20 Mar 2024 Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.

Language Modelling Machine Translation +2

Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

1 code implementation16 Sep 2023 Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, Andrey Kutuzov, Barry Haddow, Kenneth Heafield

Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants.

Instruction Following Large Language Model +3

Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis

1 code implementation19 May 2023 Mario Giulianelli, Iris Luden, Raquel Fernandez, Andrey Kutuzov

We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.

Language Modelling Semantic Similarity +3

NorBench -- A Benchmark for Norwegian Language Models

1 code implementation6 May 2023 David Samuel, Andrey Kutuzov, Samia Touileb, Erik Velldal, Lilja Øvrelid, Egil Rønningstad, Elina Sigdel, Anna Palatkina

We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics.

Trained on 100 million words and still in shape: BERT meets British National Corpus

2 code implementations17 Mar 2023 David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal

While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus.

Language Modelling

Contextualized language models for semantic change detection: lessons learned

1 code implementation31 Aug 2022 Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift in the lexicographic sense of the term (or at least the status of these shifts is questionable).

Change Detection

Do Not Fire the Linguist: Grammatical Profiles Help Language Models Detect Semantic Change

no code implementations LChange (ACL) 2022 Mario Giulianelli, Andrey Kutuzov, Lidia Pivovarova

In this work, we explore whether large pre-trained contextualised language models, a common tool for lexical semantic change detection, are sensitive to such morphosyntactic changes.

Change Detection XLM-R

Large-Scale Contextualised Language Modelling for Norwegian

2 code implementations NoDaLiDa 2021 Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training.

Language Modelling

RuSemShift: a dataset of historical lexical semantic change in Russian

no code implementations COLING 2020 Julia Rodina, Andrey Kutuzov

We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times.

Sentence

ELMo and BERT in semantic change detection for Russian

no code implementations7 Oct 2020 Julia Rodina, Yuliya Trofimova, Andrey Kutuzov, Ekaterina Artemova

We study the effectiveness of contextualized embeddings for the task of diachronic semantic change detection for Russian language data.

Change Detection

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

no code implementations LREC 2020 Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.

Word Embeddings Word Sense Disambiguation

\'UFAL-Oslo at MRP 2019: Garage Sale Semantic Parsing

no code implementations CONLL 2019 Kira Droganova, Andrey Kutuzov, Nikita Mediankin, Daniel Zeman

This paper describes the {\'U}FAL--Oslo system submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP, Oepen et al. 2019).

Semantic Parsing

One-to-X analogical reasoning on word embeddings: a case for diachronic armed conflict prediction from news texts

1 code implementation WS 2019 Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists.

Word Embeddings

Diachronic word embeddings and semantic shifts: a survey

no code implementations COLING 2018 Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, Erik Velldal

Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models.

Diachronic Word Embeddings Word Embeddings

Russian word sense induction by clustering averaged word embeddings

1 code implementation6 May 2018 Andrey Kutuzov

The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018).

Clustering Word Embeddings +1

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

1 code implementation19 Jan 2018 Andrey Kutuzov, Maria Kunilovskaya

Aside from the already known fact that the RNC is generally a better training corpus than web corpora, we enumerate and explain fine differences in how the models process semantic similarity task, what parts of the evaluation set are difficult for particular models and why.

Semantic Similarity Semantic Textual Similarity

Tracing armed conflicts with diachronic word embedding models

no code implementations WS 2017 Andrey Kutuzov, Erik Velldal, Lilja {\O}vrelid

Recent studies have shown that word embedding models can be used to trace time-related (diachronic) semantic shifts in particular words.

Word Embeddings

Redefining Context Windows for Word Embedding Models: An Experimental Study

no code implementations WS 2017 Pierre Lison, Andrey Kutuzov

Distributional semantic models learn vector representations of words through the contexts they occur in.

Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit

no code implementations EACL 2017 Andrey Kutuzov, Elizaveta Kuzmenko

In this demo we present WebVectors, a free and open-source toolkit helping to deploy web services which demonstrate and visualize distributional semantic models (widely known as word embeddings).

Machine Translation Named Entity Recognition (NER) +2

Exploration of register-dependent lexical semantics using word embeddings

1 code implementation WS 2016 Andrey Kutuzov, Elizaveta Kuzmenko, Anna Marakasova

We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm.

General Classification regression +1

Redefining part-of-speech classes with distributional semantic models

no code implementations CONLL 2016 Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries.

POS TAG +1

Neural Embedding Language Models in Semantic Clustering of Web Search Results

no code implementations LREC 2016 Andrey Kutuzov, Elizaveta Kuzmenko

In this paper, a new approach towards semantic clustering of the results of ambiguous search queries is presented.

Clustering

Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints

no code implementations18 Apr 2016 Andrey Kutuzov, Mikhail Kopotev, Tatyana Sviridenko, Lyubov Ivanova

We present our experience in applying distributional semantics (neural word embeddings) to the problem of representing and clustering documents in a bilingual comparable corpus.

Clustering Translation +1

Texts in, meaning out: neural language models in semantic similarity task for Russian

no code implementations30 Apr 2015 Andrey Kutuzov, Igor Andreev

Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics.

Semantic Similarity Semantic Textual Similarity

Cannot find the paper you are looking for? You can Submit a new open access paper.