Search Results for author: Goran Glavaš

Found 65 papers, 45 papers with code

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer

no code implementations • Findings (EMNLP) 2021 • Alan Ansell, Edoardo Maria Ponti, Jonas Pfeiffer, Sebastian Ruder, Goran Glavaš, Ivan Vulić, Anna Korhonen

While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board.

Dependency Parsing named-entity-recognition +4

Paper
Add Code

Training and Domain Adaptation for Supervised Text Segmentation

no code implementations • EACL (BEA) 2021 • Goran Glavaš, Ananya Ganesh, Swapna Somasundaran

In this work, we focus on the domain transfer performance of supervised neural text segmentation in the educational domain.

Domain Adaptation Hierarchical Text Segmentation +2

Paper
Add Code

DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog

1 code implementation • Findings (ACL) 2022 • Chia-Chien Hung, Anne Lauscher, Simone Ponzetto, Goran Glavaš

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD).

dialog state tracking Language Modelling +2

Paper
Code

BAD-X: Bilingual Adapters Improve Zero-Shot Cross-Lingual Transfer

1 code implementation • NAACL 2022 • Marinela Parović, Goran Glavaš, Ivan Vulić, Anna Korhonen

Adapter modules enable modular and efficient zero-shot cross-lingual transfer, where current state-of-the-art adapter-based approaches learn specialized language adapters (LAs) for individual languages.

Zero-Shot Cross-Lingual Transfer

Paper
Code

Climbing the Tower of Treebanks: Improving Low-Resource Dependency Parsing via Hierarchical Source Selection

1 code implementation • Findings (ACL) 2021 • Goran Glavaš, Ivan Vulić

Dependency Parsing

Paper
Code

Natural Language Processing for Multilingual Task-Oriented Dialogue

no code implementations • ACL 2022 • Evgeniia Razumovskaia, Goran Glavaš, Olga Majewska, Edoardo Ponti, Ivan Vulić

In this tutorial, we will thus discuss and demonstrate the importance of (building) multilingual ToD systems, and then provide a systematic overview of current research gaps, challenges and initiatives related to multilingual ToD systems, with a particular focus on their connections to current research and challenges in multilingual and low-resource NLP.

Paper
Add Code

MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation

2 code implementations • 26 Mar 2024 • Andreea Iana, Goran Glavaš, Heiko Paulheim

Our findings reveal that (i) current NNRs, even when based on a multilingual language model, suffer from substantial performance losses under ZS-XLT and that (ii) inclusion of target-language data in FS-XLT training has limited benefits, particularly when combined with a bilingual news consumption.

Cross-Lingual Transfer Language Modelling +2

Paper
Code

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

1 code implementation • 6 Mar 2024 • Indraneil Paul, Goran Glavaš, Iryna Gurevych

In particular, most mainstream Code-LMs have been pre-trained on source code files alone.

Code Completion Code Generation +4

Paper
Code

On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning

1 code implementation • 21 Dec 2023 • Chengzu Li, Han Zhou, Goran Glavaš, Anna Korhonen, Ivan Vulić

Following the standard supervised fine-tuning (SFT) paradigm, in-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs), yielding promising performance across various tasks in few-shot data setups.

In-Context Learning

Paper
Code

SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU

1 code implementation • 16 Nov 2023 • Evgeniia Razumovskaia, Goran Glavaš, Anna Korhonen, Ivan Vulić

Task-oriented dialogue (ToD) systems help users execute well-defined tasks across a variety of domains (e. g., $\textit{flight booking}$ or $\textit{food ordering}$), with their Natural Language Understanding (NLU) components being dedicated to the analysis of user utterances, predicting users' intents ($\textit{Intent Detection}$, ID) and extracting values for informational slots ($\textit{Value Extraction}$, VE).

Intent Detection Natural Language Understanding +1

Paper
Code

To Translate or Not to Translate: A Systematic Investigation of Translation-Based Cross-Lingual Transfer to Low-Resource Languages

no code implementations • 15 Nov 2023 • Benedikt Ebing, Goran Glavaš

Perfect machine translation (MT) would render cross-lingual transfer (XLT) by means of multilingual language models (LMs) superfluous.

Cross-Lingual Transfer Machine Translation +2

Paper
Add Code

Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection

no code implementations • 3 Nov 2023 • Gretel Liz De la Peña Sarracén, Paolo Rosso, Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto

In this work, we resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection.

Abusive Language Cross-Lingual Transfer +3

Paper
Add Code

AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification

1 code implementation • 1 Nov 2023 • Yongxin Huang, Kexin Wang, Sourav Dutta, Raj Nath Patel, Goran Glavaš, Iryna Gurevych

As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM.

Classification Language Modelling +4

Paper
Code

Linking Surface Facts to Large-Scale Knowledge Graphs

1 code implementation • 23 Oct 2023 • Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence, Goran Glavaš

Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples.

Knowledge Graphs Open Information Extraction

Paper
Code

One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

1 code implementation • 16 Oct 2023 • Fabian David Schmidt, Ivan Vulić, Goran Glavaš

Because of this, model selection based on source-language validation is unreliable: it picks model snapshots with suboptimal target-language performance.

Model Selection NER +3

Paper
Code

NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

1 code implementation • 2 Oct 2023 • Andreea Iana, Goran Glavaš, Heiko Paulheim

NewsRecLib is an open-source library based on Pytorch-Lightning and Hydra developed for training and evaluating neural news recommendation models.

Benchmarking News Recommendation +1

Paper
Code

Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation

2 code implementations • 29 Jul 2023 • Andreea Iana, Goran Glavaš, Heiko Paulheim

Recent neural news recommenders (NNRs) extend content-based recommendation (1) by aligning additional aspects (e. g., topic, sentiment) between candidate news and user history or (2) by diversifying recommendations w. r. t.

News Recommendation

Paper
Code

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

1 code implementation • 13 Jul 2023 • Gregor Geigle, Abhay Jain, Radu Timofte, Goran Glavaš

To this end, we \textit{re-align} an image encoder previously tuned to an English LLM to a new, multilingual LLM -- for this, we leverage multilingual data from a mix of vision-and-language tasks, which we obtain by machine-translating high-quality English data to 95 languages.

Image Captioning

Paper
Code

Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

1 code implementation • 14 Jun 2023 • Gregor Geigle, Radu Timofte, Goran Glavaš

We evaluate 8 different publicly available multilingual CLIP models on zero-shot image classification (ZS-IC) for each of the 92 Babel-ImageNet languages, demonstrating a significant gap between English ImageNet performance and that of high-resource languages (e. g., German or Chinese), and an even bigger gap for low-resource languages (e. g., Sinhala or Lao).

Image Classification Machine Translation +3

Paper
Code

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

1 code implementation • 26 May 2023 • Fabian David Schmidt, Ivan Vulić, Goran Glavaš

The results indicate that averaging model checkpoints yields systematic and consistent performance gains across diverse target languages in all tasks.

Cross-Lingual Transfer Model Selection +4

Paper
Code

Leveraging Open Information Extraction for More Robust Domain Transfer of Event Trigger Detection

1 code implementation • 23 May 2023 • David Dukić, Kiril Gashteovski, Goran Glavaš, Jan Šnajder

We address the problem of negative transfer in TD by coupling triggers between domains using subject-object relations obtained from a rule-based open information extraction (OIE) system.

Event Detection Language Modelling +2

Paper
Code

A General-Purpose Multilingual Document Encoder

1 code implementation • 11 May 2023 • Onur Galoğlu, Robert Litschko, Goran Glavaš

While a large body of work leveraged MMTs to mine parallel data and induce bilingual document embeddings, much less effort has been devoted to training general-purpose (massively) multilingual document encoder that can be used for both supervised and unsupervised document-level tasks.

Cross-Lingual Transfer Document Classification +3

Paper
Code

Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese

no code implementations • 18 Apr 2023 • Vésteinn Snæbjarnarson, Annika Simonsen, Goran Glavaš, Ivan Vulić

Multilingual language models have pushed state-of-the-art in cross-lingual NLP transfer.

named-entity-recognition Named Entity Recognition +5

Paper
Add Code

Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

1 code implementation • 6 Apr 2023 • Andreea Iana, Goran Glavaš, Heiko Paulheim

Most neural news recommenders rely on user click behavior and typically introduce dedicated user encoders that aggregate the content of clicked news into user embeddings (early fusion).

News Recommendation

Paper
Code

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

1 code implementation • 13 Oct 2022 • Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models.

Language Modelling Multi-Task Learning +2

Paper
Code

SLICER: Sliced Fine-Tuning for Low-Resource Cross-Lingual Transfer for Named Entity Recognition

1 code implementation • Proceedings of the Conference on Empirical Methods in Natural Language Processing 2022 • Fabian David Schmidt, Ivan Vulić, Goran Glavaš

Large multilingual language models generally demonstrate impressive results in zero-shot cross-lingual transfer, yet often fail to successfully transfer to low-resource languages, even for token-level prediction tasks like named entity recognition (NER).

Multilingual text classification named-entity-recognition +3

Paper
Code

Massively Multilingual Lexical Specialization of Multilingual Transformers

no code implementations • 1 Aug 2022 • Tommaso Green, Simone Paolo Ponzetto, Goran Glavaš

While pretrained language models (PLMs) primarily serve as general-purpose text encoders that can be fine-tuned for a wide variety of downstream tasks, recent work has shown that they can also be rewired to produce high-quality word representations (i. e., static word embeddings) and yield good performance in type-level lexical tasks.

Bilingual Lexicon Induction Retrieval +4

Paper
Add Code

On the Limitations of Sociodemographic Adaptation with Transformers

1 code implementation • 1 Aug 2022 • Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

We adapt the language representations for the sociodemographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning for adaptation, where we couple language modeling with the prediction of a sociodemographic class.

Language Modelling Multi-Task Learning

Paper
Code

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

3 code implementations • NAACL (MIA) 2022 • Chia-Chien Hung, Tommaso Green, Robert Litschko, Tornike Tsereteli, Sotaro Takeshita, Marco Bombieri, Goran Glavaš, Simone Paolo Ponzetto

This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA).

Answer Generation Data Augmentation +4

Paper
Code

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

1 code implementation • NAACL 2022 • Chia-Chien Hung, Anne Lauscher, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

We then introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.

Cross-Lingual Transfer dialog state tracking +1

Paper
Code

Probing Cross-Lingual Lexical Knowledge from Multilingual Sentence Encoders

no code implementations • 30 Apr 2022 • Ivan Vulić, Goran Glavaš, Fangyu Liu, Nigel Collier, Edoardo Maria Ponti, Anna Korhonen

In this work, we probe SEs for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.

Contrastive Learning Cross-Lingual Entity Linking +6

Paper
Add Code

Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval

1 code implementation • COLING 2022 • Robert Litschko, Ivan Vulić, Goran Glavaš

Current approaches therefore commonly transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders: they fine-tune all parameters of pretrained massively multilingual Transformers (MMTs, e. g., multilingual BERT) on English relevance judgments, and then deploy them in the target language(s).

Cross-Lingual Transfer Language Modelling +3

Paper
Code

Geographic Adaptation of Pretrained Language Models

no code implementations • 16 Mar 2022 • Valentin Hofmann, Goran Glavaš, Nikola Ljubešić, Janet B. Pierrehumbert, Hinrich Schütze

While pretrained language models (PLMs) have been shown to possess a plethora of linguistic knowledge, the existing body of research has largely neglected extralinguistic knowledge, which is generally difficult to obtain by pretraining on text alone.

Language Identification Language Modelling +2

Paper
Add Code

On Cross-Lingual Retrieval with Multilingual Text Encoders

1 code implementation • 21 Dec 2021 • Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs.

Re-Ranking Retrieval +2

Paper
Code

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

1 code implementation • 15 Oct 2021 • Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš

dialog state tracking Language Modelling +2

Paper
Code

AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

1 code implementation • ACL 2022 • Niklas Friedrich, Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš

Open Information Extraction (OIE) is the task of extracting facts from sentences in the form of relations and their corresponding arguments in schema-free manner.

Open Information Extraction Sentence

Paper
Code

BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation

1 code implementation • ACL 2022 • Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš

In this work, we introduce BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese, and German.

Ranked #1 on Open Information Extraction on BenchIE

Open Information Extraction

Paper
Code

Sustainable Modular Debiasing of Language Models

no code implementations • Findings (EMNLP) 2021 • Anne Lauscher, Tobias Lüken, Goran Glavaš

Unfair stereotypical biases (e. g., gender, racial, or religious biases) encoded in modern pretrained language models (PLMs) have negative ethical implications for widespread adoption of state-of-the-art language technology.

Fairness Language Modelling

Paper
Add Code

Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases

1 code implementation • 13 Aug 2021 • Tobias Walter, Celina Kirschner, Steffen Eger, Goran Glavaš, Anne Lauscher, Simone Paolo Ponzetto

We analyze bias in historical corpora as encoded in diachronic distributional semantic models by focusing on two specific forms of bias, namely a political (i. e., anti-communism) and racist (i. e., antisemitism) one.

Diachronic Word Embeddings Word Embeddings

Paper
Code

Scientia Potentia Est -- On the Role of Knowledge in Computational Argumentation

no code implementations • 1 Jul 2021 • Anne Lauscher, Henning Wachsmuth, Iryna Gurevych, Goran Glavaš

Despite extensive research efforts in recent years, computational argumentation (CA) remains one of the most challenging areas of natural language processing.

Common Sense Reasoning Natural Language Understanding

Paper
Add Code

RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models

1 code implementation • ACL 2021 • Soumya Barikeri, Anne Lauscher, Ivan Vulić, Goran Glavaš

We use the evaluation framework to benchmark the widely used conversational DialoGPT model along with the adaptations of four debiasing methods.

Conversational Response Generation Response Generation

Paper
Code

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

no code implementations • 17 Apr 2021 • Evgeniia Razumovskaia, Goran Glavaš, Olga Majewska, Edoardo M. Ponti, Anna Korhonen, Ivan Vulić

We find that the most critical factor preventing the creation of truly multilingual ToD systems is the lack of datasets in most languages for both training and evaluation.

Cross-Lingual Transfer Machine Translation +2

Paper
Add Code

DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

2 code implementations • EACL 2021 • Niklas Friedrich, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš

In this work, we present DebIE, the first integrated platform for (1) measuring and (2) mitigating bias in word embeddings.

Fairness Word Embeddings

Paper
Code

Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval

1 code implementation • 21 Jan 2021 • Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs.

Cross-Lingual Word Embeddings Representation Learning +3

Paper
Code

Verb Knowledge Injection for Multilingual Event Processing

no code implementations • ACL 2021 • Olga Majewska, Ivan Vulić, Goran Glavaš, Edoardo M. Ponti, Anna Korhonen

We investigate whether injecting explicit information on verbs' semantic-syntactic behaviour improves the performance of LM-pretrained Transformers in event extraction tasks -- downstream tasks for which accurate verb processing is paramount.

Event Extraction Language Modelling

Paper
Add Code

Self-Supervised Learning for Visual Summary Identification in Scientific Publications

no code implementations • 21 Dec 2020 • Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš, Shigeo Morishima

Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications.

Self-Supervised Learning

Paper
Add Code

Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer

no code implementations • 11 Dec 2020 • Marko Vidoni, Ivan Vulić, Goran Glavaš

Adapter modules, additional trainable parameters that enable efficient fine-tuning of pretrained transformers, have recently been used for language specialization of multilingual transformers, improving downstream zero-shot cross-lingual transfer.

NER POS +2

Paper
Add Code

AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings

no code implementations • COLING (WANLP) 2020 • Anne Lauscher, Rafik Takieddin, Simone Paolo Ponzetto, Goran Glavaš

Our analysis yields several interesting findings, e. g., that implicit gender bias in embeddings trained on Arabic news corpora steadily increases over time (between 2007 and 2017).

Word Embeddings

Paper
Add Code

Probing Pretrained Language Models for Lexical Semantics

no code implementations • EMNLP 2020 • Ivan Vulić, Edoardo Maria Ponti, Robert Litschko, Goran Glavaš, Anna Korhonen

The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest in probing their representations, in order to unveil what types of knowledge they implicitly capture.

World Knowledge

Paper
Add Code

Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation

3 code implementations • 15 Aug 2020 • Goran Glavaš, Ivan Vulić

Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU).

Language Modelling Natural Language Understanding

124,593

Paper
Code

Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

1 code implementation • EMNLP (DeeLIO) 2020 • Anne Lauscher, Olga Majewska, Leonardo F. R. Ribeiro, Iryna Gurevych, Nikolai Rozanov, Goran Glavaš

Following the major success of neural language models (LMs) such as BERT or GPT-2 on a variety of language understanding tasks, recent work focused on injecting (structured) knowledge from external resources into these models.

Common Sense Reasoning World Knowledge

Paper
Code

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

1 code implementation • ACL 2020 • Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West, Steffen Eger

We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.

Language Modelling Machine Translation +4

Paper
Code

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

1 code implementation • EMNLP 2020 • Edoardo Maria Ponti, Goran Glavaš, Olga Majewska, Qianchu Liu, Ivan Vulić, Anna Korhonen

In order to simulate human language capacity, natural language processing systems must be able to reason about the dynamics of everyday situations, including their possible causes and effects.

Ranked #3 on Cross-Lingual Transfer on XCOPA (using extra training data)

Cross-Lingual Transfer Translation +1

Paper
Code

From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers

no code implementations • 1 May 2020 • Anne Lauscher, Vinit Ravishankar, Ivan Vulić, Goran Glavaš

Massively multilingual transformers pretrained with language modeling objectives (e. g., mBERT, XLM-R) have become a de facto default transfer paradigm for zero-shot cross-lingual transfer in NLP, offering unmatched transfer performance.

Cross-Lingual Word Embeddings Dependency Parsing +6

Paper
Add Code

Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers

no code implementations • COLING 2020 • Robert Litschko, Ivan Vulić, Željko Agić, Goran Glavaš

Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level".

Cross-Lingual Transfer POS

Paper
Add Code

Windowing Models for Abstractive Summarization of Long Texts

no code implementations • 7 Apr 2020 • Leon Schüller, Florian Wilhelm, Nico Kreiling, Goran Glavaš

Neural summarization models suffer from the fixed-size input limitation: if text length surpasses the model's maximal number of input tokens, some document content (possibly summary-relevant) gets truncated Independently summarizing windows of maximal input size disallows for information flow between windows and leads to incoherent summaries.

Abstractive Text Summarization

Paper
Add Code

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

1 code implementation • 3 Jan 2020 • Goran Glavaš, Swapna Somasundaran

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval.

Cross-Lingual Word Embeddings Multi-Task Learning +6

Paper
Code

A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

4 code implementations • 13 Sep 2019 • Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić

Moreover, we successfully transfer debiasing models, by means of cross-lingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.

Word Embeddings

Paper
Code

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

1 code implementation • COLING 2020 • Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti, Anna Korhonen, Goran Glavaš

In this work, we complement such distributional knowledge with external lexical knowledge, that is, we integrate the discrete knowledge on word-level semantic similarity into pretraining.

Language Modelling Lexical Simplification +7

Paper
Code

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

1 code implementation • IJCNLP 2019 • Ivan Vulić, Goran Glavaš, Roi Reichart, Anna Korhonen

A series of bilingual lexicon induction (BLI) experiments with 15 diverse languages (210 language pairs) show that fully unsupervised CLWE methods still fail for a large number of language pairs (e. g., they yield zero BLI performance for 87/210 pairs).

Bilingual Lexicon Induction Self-Learning

Paper
Code

Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors

1 code implementation • SEMEVAL 2019 • Anne Lauscher, Goran Glavaš

In this work, we present a systematic study of biases encoded in distributional word vector spaces: we analyze how consistent the bias effects are across languages, corpora, and embedding models.

Cross-Lingual Transfer Word Embeddings

Paper
Code

Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization

1 code implementation • EMNLP 2018 • Edoardo Maria Ponti, Ivan Vulić, Goran Glavaš, Nikola Mrkšić, Anna Korhonen

Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space.

dialog state tracking Lexical Simplification +2

Paper
Code

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources

1 code implementation • NAACL 2018 • Ivan Vulić, Goran Glavaš, Nikola Mrkšić, Anna Korhonen

Word vector specialisation (also known as retrofitting) is a portable, light-weight approach to fine-tuning arbitrary distributional word vector spaces by injecting external knowledge from rich lexical resources such as WordNet.

Dialogue State Tracking Text Simplification +1

Paper
Code

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

1 code implementation • 2 May 2018 • Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić

We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all.

Cross-Lingual Information Retrieval Retrieval

Paper
Code

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

1 code implementation • 19 Jan 2018 • Goran Glavaš, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso

In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages.

Cross-Lingual Information Retrieval Cross-Lingual Semantic Textual Similarity +9

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.