Search Results for author: Goran Glavaš

Found 41 papers, 24 papers with code

Training and Domain Adaptation for Supervised Text Segmentation

no code implementations EACL (BEA) 2021 Goran Glavaš, Ananya Ganesh, Swapna Somasundaran

In this work, we focus on the domain transfer performance of supervised neural text segmentation in the educational domain.

Domain Adaptation Text Segmentation

Natural Language Processing for Multilingual Task-Oriented Dialogue

no code implementations ACL 2022 Evgeniia Razumovskaia, Goran Glavaš, Olga Majewska, Edoardo Ponti, Ivan Vulić

In this tutorial, we will thus discuss and demonstrate the importance of (building) multilingual ToD systems, and then provide a systematic overview of current research gaps, challenges and initiatives related to multilingual ToD systems, with a particular focus on their connections to current research and challenges in multilingual and low-resource NLP.

DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog

1 code implementation Findings (ACL) 2022 Chia-Chien Hung, Anne Lauscher, Simone Ponzetto, Goran Glavaš

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD).

Language Modelling Masked Language Modeling +1

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer

no code implementations Findings (EMNLP) 2021 Alan Ansell, Edoardo Maria Ponti, Jonas Pfeiffer, Sebastian Ruder, Goran Glavaš, Ivan Vulić, Anna Korhonen

While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board.

Dependency Parsing Named Entity Recognition +3

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

no code implementations20 May 2022 Chia-Chien Hung, Anne Lauscher, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

We then introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.

Cross-Lingual Transfer Pretrained Language Models

Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence Encoders

no code implementations30 Apr 2022 Ivan Vulić, Goran Glavaš, Fangyu Liu, Nigel Collier, Edoardo Maria Ponti, Anna Korhonen

Pretrained multilingual language models (LMs) can be successfully transformed into multilingual sentence encoders (SEs; e. g., LaBSE, xMPNET) via additional fine-tuning or model distillation on parallel data.

Contrastive Learning Cross-Lingual Entity Linking +5

Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval

1 code implementation5 Apr 2022 Robert Litschko, Ivan Vulić, Goran Glavaš

Current approaches therefore typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders: they fine-tune all the parameters of a pretrained massively multilingual Transformer (MMT, e. g., multilingual BERT) on English relevance judgments and then deploy it in the target language.

Cross-Lingual Transfer Language Modelling +1

Geographic Adaptation of Pretrained Language Models

no code implementations16 Mar 2022 Valentin Hofmann, Goran Glavaš, Nikola Ljubešić, Janet B. Pierrehumbert, Hinrich Schütze

Geographic linguistic features are commonly used to improve the performance of pretrained language models (PLMs) on NLP tasks where geographic knowledge is intuitively beneficial (e. g., geolocation prediction and dialect feature prediction).

Language Modelling Masked Language Modeling +1

On Cross-Lingual Retrieval with Multilingual Text Encoders

1 code implementation21 Dec 2021 Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs.

Re-Ranking Zero-Shot Cross-Lingual Transfer

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

1 code implementation15 Oct 2021 Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD).

Language Modelling Masked Language Modeling +1

AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

1 code implementation ACL 2022 Niklas Friedrich, Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš

Open Information Extraction (OIE) is the task of extracting facts from sentences in the form of relations and their corresponding arguments in schema-free manner.

Open Information Extraction

Sustainable Modular Debiasing of Language Models

no code implementations Findings (EMNLP) 2021 Anne Lauscher, Tobias Lüken, Goran Glavaš

Unfair stereotypical biases (e. g., gender, racial, or religious biases) encoded in modern pretrained language models (PLMs) have negative ethical implications for widespread adoption of state-of-the-art language technology.

Fairness Language Modelling +1

Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases

1 code implementation13 Aug 2021 Tobias Walter, Celina Kirschner, Steffen Eger, Goran Glavaš, Anne Lauscher, Simone Paolo Ponzetto

We analyze bias in historical corpora as encoded in diachronic distributional semantic models by focusing on two specific forms of bias, namely a political (i. e., anti-communism) and racist (i. e., antisemitism) one.

Diachronic Word Embeddings Word Embeddings

Scientia Potentia Est -- On the Role of Knowledge in Computational Argumentation

no code implementations1 Jul 2021 Anne Lauscher, Henning Wachsmuth, Iryna Gurevych, Goran Glavaš

In this survey paper, we fill this gap by (1) proposing a pyramid of types of knowledge required in CA tasks, (2) analysing the state of the art with respect to the reliance and exploitation of these types of knowledge, for each of the for main research areas in CA, and (3) outlining and discussing directions for future research efforts in CA.

Common Sense Reasoning Natural Language Understanding

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

no code implementations17 Apr 2021 Evgeniia Razumovskaia, Goran Glavaš, Olga Majewska, Edoardo M. Ponti, Anna Korhonen, Ivan Vulić

We find that the most critical factor preventing the creation of truly multilingual ToD systems is the lack of datasets in most languages for both training and evaluation.

Cross-Lingual Transfer Machine Translation +2

Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval

1 code implementation21 Jan 2021 Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs.

Cross-Lingual Word Embeddings Word Embeddings

Verb Knowledge Injection for Multilingual Event Processing

no code implementations ACL 2021 Olga Majewska, Ivan Vulić, Goran Glavaš, Edoardo M. Ponti, Anna Korhonen

We investigate whether injecting explicit information on verbs' semantic-syntactic behaviour improves the performance of LM-pretrained Transformers in event extraction tasks -- downstream tasks for which accurate verb processing is paramount.

Event Extraction Language Modelling

Self-Supervised Learning for Visual Summary Identification in Scientific Publications

no code implementations21 Dec 2020 Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš, Shigeo Morishima

Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications.

Self-Supervised Learning

Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer

no code implementations11 Dec 2020 Marko Vidoni, Ivan Vulić, Goran Glavaš

Adapter modules, additional trainable parameters that enable efficient fine-tuning of pretrained transformers, have recently been used for language specialization of multilingual transformers, improving downstream zero-shot cross-lingual transfer.

NER POS +1

AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings

no code implementations COLING (WANLP) 2020 Anne Lauscher, Rafik Takieddin, Simone Paolo Ponzetto, Goran Glavaš

Our analysis yields several interesting findings, e. g., that implicit gender bias in embeddings trained on Arabic news corpora steadily increases over time (between 2007 and 2017).

Word Embeddings

Probing Pretrained Language Models for Lexical Semantics

no code implementations EMNLP 2020 Ivan Vulić, Edoardo Maria Ponti, Robert Litschko, Goran Glavaš, Anna Korhonen

The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest in probing their representations, in order to unveil what types of knowledge they implicitly capture.

Pretrained Language Models

Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation

1 code implementation15 Aug 2020 Goran Glavaš, Ivan Vulić

Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU).

Language Modelling Natural Language Understanding

Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

1 code implementation EMNLP (DeeLIO) 2020 Anne Lauscher, Olga Majewska, Leonardo F. R. Ribeiro, Iryna Gurevych, Nikolai Rozanov, Goran Glavaš

Following the major success of neural language models (LMs) such as BERT or GPT-2 on a variety of language understanding tasks, recent work focused on injecting (structured) knowledge from external resources into these models.

Common Sense Reasoning

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

1 code implementation ACL 2020 Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West, Steffen Eger

We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.

Language Modelling Machine Translation +4

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

1 code implementation EMNLP 2020 Edoardo Maria Ponti, Goran Glavaš, Olga Majewska, Qianchu Liu, Ivan Vulić, Anna Korhonen

In order to simulate human language capacity, natural language processing systems must be able to reason about the dynamics of everyday situations, including their possible causes and effects.

 Ranked #1 on Cross-Lingual Transfer on XCOPA (using extra training data)

Cross-Lingual Transfer Translation

From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers

no code implementations1 May 2020 Anne Lauscher, Vinit Ravishankar, Ivan Vulić, Goran Glavaš

Massively multilingual transformers pretrained with language modeling objectives (e. g., mBERT, XLM-R) have become a de facto default transfer paradigm for zero-shot cross-lingual transfer in NLP, offering unmatched transfer performance.

Cross-Lingual Word Embeddings Dependency Parsing +5

Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers

no code implementations COLING 2020 Robert Litschko, Ivan Vulić, Željko Agić, Goran Glavaš

Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level".

14 Cross-Lingual Transfer +1

Windowing Models for Abstractive Summarization of Long Texts

no code implementations7 Apr 2020 Leon Schüller, Florian Wilhelm, Nico Kreiling, Goran Glavaš

Neural summarization models suffer from the fixed-size input limitation: if text length surpasses the model's maximal number of input tokens, some document content (possibly summary-relevant) gets truncated Independently summarizing windows of maximal input size disallows for information flow between windows and leads to incoherent summaries.

Abstractive Text Summarization

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

1 code implementation3 Jan 2020 Goran Glavaš, Swapna Somasundaran

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval.

Cross-Lingual Word Embeddings Multi-Task Learning +2

A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

3 code implementations13 Sep 2019 Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić

Moreover, we successfully transfer debiasing models, by means of cross-lingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.

Word Embeddings

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

1 code implementation COLING 2020 Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti, Anna Korhonen, Goran Glavaš

In this work, we complement such distributional knowledge with external lexical knowledge, that is, we integrate the discrete knowledge on word-level semantic similarity into pretraining.

Language Modelling Lexical Simplification +6

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

1 code implementation IJCNLP 2019 Ivan Vulić, Goran Glavaš, Roi Reichart, Anna Korhonen

A series of bilingual lexicon induction (BLI) experiments with 15 diverse languages (210 language pairs) show that fully unsupervised CLWE methods still fail for a large number of language pairs (e. g., they yield zero BLI performance for 87/210 pairs).

Bilingual Lexicon Induction Self-Learning

Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors

1 code implementation SEMEVAL 2019 Anne Lauscher, Goran Glavaš

In this work, we present a systematic study of biases encoded in distributional word vector spaces: we analyze how consistent the bias effects are across languages, corpora, and embedding models.

Cross-Lingual Transfer Word Embeddings

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources

1 code implementation NAACL 2018 Ivan Vulić, Goran Glavaš, Nikola Mrkšić, Anna Korhonen

Word vector specialisation (also known as retrofitting) is a portable, light-weight approach to fine-tuning arbitrary distributional word vector spaces by injecting external knowledge from rich lexical resources such as WordNet.

Dialogue State Tracking Text Simplification +1

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

1 code implementation2 May 2018 Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić

We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all.

Information Retrieval

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

1 code implementation19 Jan 2018 Goran Glavaš, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso

In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages.

Cross-Lingual Semantic Textual Similarity Information Retrieval +6

Cannot find the paper you are looking for? You can Submit a new open access paper.