Search Results for author: Magnus Sahlgren

Found 35 papers, 9 papers with code

Navigating the Semantic Horizon using Relative Neighborhood Graphs

no code implementations EMNLP 2015 Amaru Cuba Gyllensten, Magnus Sahlgren

We also argue that the topology of the neighborhoods in semantic space can be used to determine the semantic horizon of a point, which we define as the set of neighbors that have a direct connection to the point.

Word Sense Induction

The Gavagai Living Lexicon

no code implementations LREC 2016 Magnus Sahlgren, Amaru Cuba Gyllensten, Fredrik Espinoza, Ola Hamfors, Jussi Karlgren, Fredrik Olsson, Per Persson, Akshay Viswanathan, Anders Holst

This paper presents the Gavagai Living Lexicon, which is an online distributional semantic model currently available in 20 different languages.

The Effects of Data Size and Frequency Range on Distributional Semantic Models

no code implementations EMNLP 2016 Magnus Sahlgren, Alessandro Lenci

This paper investigates the effects of data size and frequency range on distributional semantic models.

Active learning for detection of stance components

no code implementations WS 2016 Maria Skeppstedt, Magnus Sahlgren, Carita Paradis, Andreas Kerren

This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition.

Active Learning Opinion Mining +2

Distributional Term Set Expansion

no code implementations LREC 2018 Amaru Cuba Gyllensten, Magnus Sahlgren

This paper is a short empirical study of the performance of centrality and classification based iterative term set expansion methods for distributional semantic models.

Active Learning Classification +1

Monitoring Targeted Hate in Online Environments

no code implementations13 Mar 2018 Tim Isbister, Magnus Sahlgren, Lisa Kaati, Milan Obaidi, Nazar Akrami

Hateful comments, swearwords and sometimes even death threats are becoming a reality for many people today in online environments.

Learning Representations for Detecting Abusive Language

no code implementations WS 2018 Magnus Sahlgren, Tim Isbister, Fredrik Olsson

This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language.

Abusive Language Language Modelling +4

Text Categorization for Conflict Event Annotation

no code implementations LREC 2020 Fredrik Olsson, Magnus Sahlgren, Fehmi ben Abdesslem, Ariel Ekgren, Kristine Eck

We cast the problem of event annotation as one of text categorization, and compare state of the art text categorization techniques on event data produced within the Uppsala Conflict Data Program (UCDP).

Text Categorization

Data Readiness for Natural Language Processing

2 code implementations4 Sep 2020 Fredrik Olsson, Magnus Sahlgren

This document concerns data readiness in the context of machine learning and Natural Language Processing.

BIG-bench Machine Learning

SenseCluster at SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

no code implementations SEMEVAL 2020 Amaru Cuba Gyllensten, Evangelia Gogoulou, Ariel Ekgren, Magnus Sahlgren

We (Team Skurt) propose a simple method to detect lexical semantic change by clustering contextualized embeddings produced by XLM-R, using K-Means++.

Change Detection Clustering +1

Deep Representational Re-tuning using Contrastive Tension

1 code implementation ICLR 2021 Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylipää Hellqvist, Magnus Sahlgren

Extracting semantically useful natural language sentence representations from pre-trained deep neural networks such as Transformers remains a challenge.

Semantic Similarity Semantic Textual Similarity +3

The Singleton Fallacy: Why Current Critiques of Language Models Miss the Point

no code implementations8 Feb 2021 Magnus Sahlgren, Fredrik Carlsson

By contrast, we will argue that there are many different types of language use, meaning and understanding, and that (current) language models are build with the explicit purpose of acquiring and representing one type of structural understanding of language.

Natural Language Understanding Position

Predicting Treatment Outcome from Patient Texts:The Case of Internet-Based Cognitive Behavioural Therapy

no code implementations EACL 2021 Evangelia Gogoulou, Magnus Boman, Fehmi ben Abdesslem, Nils Hentati Isacsson, Viktor Kaldo, Magnus Sahlgren

We investigate the feasibility of applying standard text categorisation methods to patient text in order to predict treatment outcome in Internet-based cognitive behavioural therapy.

Sentiment Analysis

Federated Word2Vec: Leveraging Federated Learning to Encourage Collaborative Representation Learning

no code implementations19 Apr 2021 Daniel Garcia Bernal, Lodovico Giaretta, Sarunas Girdzijauskas, Magnus Sahlgren

The results show that neither the quality of the results nor the convergence time in Federated Word2Vec deteriorates as compared to centralised Word2Vec.

Federated Learning Representation Learning

Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?

1 code implementation NoDaLiDa 2021 Tim Isbister, Fredrik Carlsson, Magnus Sahlgren

We demonstrate empirically that a large English language model coupled with modern machine translation outperforms native language models in most Scandinavian languages.

Language Modelling Machine Translation +1

A comparative evaluation and analysis of three generations of Distributional Semantic Models

1 code implementation20 May 2021 Alessandro Lenci, Magnus Sahlgren, Patrick Jeuniaux, Amaru Cuba Gyllensten, Martina Miliani

In this paper, we perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.

Cross-lingual Transfer of Monolingual Models

no code implementations LREC 2022 Evangelia Gogoulou, Ariel Ekgren, Tim Isbister, Magnus Sahlgren

Additionally, the results of evaluating the transferred models in source language tasks reveal that their performance in the source domain deteriorates after transfer.

Cross-Lingual Transfer Domain Adaptation

We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

2 code implementations11 Oct 2021 Fredrik Olsson, Magnus Sahlgren

In this paper, we identify the state of data as being an important reason for failure in applied Natural Language Processing (NLP) projects.

The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling

no code implementations30 Mar 2023 Joey Öhman, Severine Verlinden, Ariel Ekgren, Amaru Cuba Gyllensten, Tim Isbister, Evangelia Gogoulou, Fredrik Carlsson, Magnus Sahlgren

Pre-training Large Language Models (LLMs) require massive amounts of text data, and the performance of the LLMs typically correlates with the scale and quality of the datasets.

Language Modelling

Fine-Grained Controllable Text Generation Using Non-Residual Prompting

1 code implementation ACL 2022 Fredrik Carlsson, Joey Öhman, Fangyu Liu, Severine Verlinden, Joakim Nivre, Magnus Sahlgren

We propose a resource-efficient method for converting a pre-trained CLM into this architecture, and demonstrate its potential on various experiments, including the novel task of contextualized word inclusion.

Text Generation

Cross-lingual and Multilingual CLIP

1 code implementation LREC 2022 Fredrik Carlsson, Philipp Eisen, Faton Rekathati, Magnus Sahlgren

The long-standing endeavor of relating the textual and the visual domain recently underwent a pivotal breakthrough, as OpenAI released CLIP.

Contrastive Learning Machine Translation +3

It’s Basically the Same Language Anyway: the Case for a Nordic Language Model

no code implementations NoDaLiDa 2021 Magnus Sahlgren, Fredrik Carlsson, Fredrik Olsson, Love Börjeson

When is it beneficial for a research community to organize a broader collaborative effort on a topic, and when should we instead promote individual efforts?

Language Modelling

Decentralized Word2Vec Using Gossip Learning

no code implementations NoDaLiDa 2021 Abdul Aziz Alkathiri, Lodovico Giaretta, Sarunas Girdzijauskas, Magnus Sahlgren

Advanced NLP models require huge amounts of data from various domains to produce high-quality representations.

Gender Bias in Pretrained Swedish Embeddings

no code implementations WS (NoDaLiDa) 2019 Magnus Sahlgren, Fredrik Olsson

This paper investigates the presence of gender bias in pretrained Swedish embeddings.

Cannot find the paper you are looking for? You can Submit a new open access paper.