no code implementations • SemEval (NAACL) 2022 • Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal
In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity.
no code implementations • NoDaLiDa 2021 • Vinit Ravishankar, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal
Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages.
1 code implementation • 10 Apr 2025 • Vladislav Mikhailov, Tita Enstad, David Samuel, Hans Christian Farsethås, Andrey Kutuzov, Erik Velldal, Lilja Øvrelid
We describe the NorEval design and present the results of benchmarking 19 open-source pre-trained and instruction-tuned LMs for Norwegian in various scenarios.
no code implementations • 12 Dec 2024 • Javier de la Rosa, Vladislav Mikhailov, Lemei Zhang, Freddy Wetjen, David Samuel, Peng Liu, Rolv-Arild Braaten, Petter Mæhlum, Magnus Breder Birkenes, Andrey Kutuzov, Tita Enstad, Hans Christian Farsethås, Svein Arne Brygfjeld, Jon Atle Gulla, Stephan Oepen, Erik Velldal, Wilfred Østgulen, Liljia Øvrelid, Aslak Sira Myhre
The use of copyrighted materials in training language models raises critical legal and ethical questions.
no code implementations • 9 Dec 2024 • David Samuel, Vladislav Mikhailov, Erik Velldal, Lilja Øvrelid, Lucas Georges Gabriel Charpentier, Andrey Kutuzov, Stephan Oepen
Training large language models requires vast amounts of data, posing a challenge for less widely spoken languages like Norwegian and even more so for truly low-resource languages like Northern S\'ami.
no code implementations • 4 Jul 2024 • Mariia Fedorova, Timothee Mickus, Niko Partanen, Janine Siewert, Elena Spaziani, Andrey Kutuzov
This paper describes the organization and findings of AXOLOTL'24, the first multilingual explainable semantic change modeling shared task.
1 code implementation • 20 Jun 2024 • Mariia Fedorova, Andrey Kutuzov, Yves Scherrer
We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD).
2 code implementations • 26 Mar 2024 • Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev, Dominik Schlechtweg
We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions.
no code implementations • 20 Mar 2024 • Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann
We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.
1 code implementation • 16 Sep 2023 • Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, Andrey Kutuzov, Barry Haddow, Kenneth Heafield
Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants.
1 code implementation • 19 May 2023 • Mario Giulianelli, Iris Luden, Raquel Fernandez, Andrey Kutuzov
We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.
1 code implementation • 6 May 2023 • David Samuel, Andrey Kutuzov, Samia Touileb, Erik Velldal, Lilja Øvrelid, Egil Rønningstad, Elina Sigdel, Anna Palatkina
We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics.
2 code implementations • 17 Mar 2023 • David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal
While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus.
2 code implementations • COLING (TextGraphs) 2022 • Anna Aksenova, Ekaterina Gavrishina, Elisey Rykov, Andrey Kutuzov
We present RuDSI, a new benchmark for word sense induction (WSI) in Russian.
1 code implementation • 31 Aug 2022 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid
Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift in the lexicographic sense of the term (or at least the status of these shifts is questionable).
no code implementations • LChange (ACL) 2022 • Mario Giulianelli, Andrey Kutuzov, Lidia Pivovarova
In this work, we explore whether large pre-trained contextualised language models, a common tool for lexical semantic change detection, are sensitive to such morphosyntactic changes.
1 code implementation • LREC 2022 • Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Ranveig Enstad, Alexandra Wittemann
We describe NorDiaChange: the first diachronic semantic change dataset for Norwegian.
1 code implementation • CoNLL (EMNLP) 2021 • Mario Giulianelli, Andrey Kutuzov, Lidia Pivovarova
Semantics, morphology and syntax are strongly interdependent.
no code implementations • ACL (LChange) 2021 • Andrey Kutuzov, Lidia Pivovarova
We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval.
no code implementations • 3 May 2021 • Tatyana Iazykova, Denis Kapelyushnik, Olga Bystrova, Andrey Kutuzov
Often approaches based on simple rules outperform or come close to the results of the notorious pre-trained language models like GPT-3 or BERT.
Ranked #3 on
Common Sense Reasoning
on RWSD
2 code implementations • NoDaLiDa 2021 • Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen
We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training.
no code implementations • EACL 2021 • Andrey Kutuzov, Elizaveta Kuzmenko
We describe a new addition to the WebVectors toolkit which is used to serve word embedding models over the Web.
no code implementations • COLING 2020 • Julia Rodina, Andrey Kutuzov
We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times.
no code implementations • 7 Oct 2020 • Julia Rodina, Yuliya Trofimova, Andrey Kutuzov, Ekaterina Artemova
We study the effectiveness of contextualized embeddings for the task of diachronic semantic change detection for Russian language data.
1 code implementation • SEMEVAL 2020 • Andrey Kutuzov, Mario Giulianelli
We apply contextualised word embeddings to lexical semantic change detection in the SemEval-2020 Shared Task 1.
no code implementations • LREC 2020 • Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.
no code implementations • CONLL 2019 • Kira Droganova, Andrey Kutuzov, Nikita Mediankin, Daniel Zeman
This paper describes the {\'U}FAL--Oslo system submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP, Oepen et al. 2019).
no code implementations • WS 2019 • Andrey Kutuzov, Elizaveta Kuzmenko
Then, these models were evaluated on the word sense disambiguation task.
no code implementations • WS 2019 • Julia Rodina, Baksh, Daria aeva, Vadim Fomin, Andrey Kutuzov, Samia Touileb, Erik Velldal
We measure the intensity of diachronic semantic shifts in adjectives in English, Norwegian and Russian across 5 decades.
1 code implementation • WS 2019 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid
We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists.
1 code implementation • ACL 2019 • Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko
The computation of distance measures between nodes in graphs is inefficient and does not scale to large graphs.
1 code implementation • 16 May 2019 • Vadim Fomin, Daria Bakshandaeva, Julia Rodina, Andrey Kutuzov
The paper introduces manually annotated test sets for the task of tracing diachronic (temporal) semantic shifts in Russian.
no code implementations • SEMEVAL 2019 • Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko
We present path2vec, a new approach for learning graph embeddings that relies on structural measures of pairwise node similarities.
no code implementations • COLING 2018 • Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, Erik Velldal
Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models.
1 code implementation • 6 May 2018 • Andrey Kutuzov
The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018).
1 code implementation • 19 Jan 2018 • Andrey Kutuzov, Maria Kunilovskaya
Aside from the already known fact that the RNC is generally a better training corpus than web corpora, we enumerate and explain fine differences in how the models process semantic similarity task, what parts of the evaluation set are difficult for particular models and why.
no code implementations • WS 2017 • Andrey Kutuzov, Erik Velldal, Lilja {\O}vrelid
Recent studies have shown that word embedding models can be used to trace time-related (diachronic) semantic shifts in particular words.
no code implementations • EMNLP 2017 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid
This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words.
no code implementations • WS 2017 • Pierre Lison, Andrey Kutuzov
Distributional semantic models learn vector representations of words through the contexts they occur in.
no code implementations • EACL 2017 • Andrey Kutuzov, Elizaveta Kuzmenko
In this demo we present WebVectors, a free and open-source toolkit helping to deploy web services which demonstrate and visualize distributional semantic models (widely known as word embeddings).
no code implementations • WS 2017 • Andrey Kutuzov, Elizaveta Kuzmenko, Lidia Pivovarova
This paper presents a method of automatic construction extraction from a large corpus of Russian.
1 code implementation • WS 2016 • Andrey Kutuzov, Elizaveta Kuzmenko, Anna Marakasova
We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm.
no code implementations • CONLL 2016 • Andrey Kutuzov, Erik Velldal, Lilja Øvrelid
This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries.
no code implementations • LREC 2016 • Andrey Kutuzov, Elizaveta Kuzmenko
In this paper, a new approach towards semantic clustering of the results of ambiguous search queries is presented.
no code implementations • 18 Apr 2016 • Andrey Kutuzov, Mikhail Kopotev, Tatyana Sviridenko, Lyubov Ivanova
We present our experience in applying distributional semantics (neural word embeddings) to the problem of representing and clustering documents in a bilingual comparable corpus.
no code implementations • 30 Apr 2015 • Andrey Kutuzov, Igor Andreev
Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics.
no code implementations • 4 Sep 2014 • Andrey Kutuzov
The paper deals with word sense induction from lexical co-occurrence graphs.