Search Results for author: Silvia Severini

Found 10 papers, 4 papers with code

Don’t Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • LREC (BUCC) 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages

no code implementations • 21 Nov 2023 • Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser, Hinrich Schütze

In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target.

Bilingual Lexicon Induction Multilingual NLP +1

Paper
Add Code

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages

1 code implementation • 20 May 2023 • Ayyoob Imani, Peiqin Lin, Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, Nora Kassner, Chunlan Ma, Helmut Schmid, André F. T. Martins, François Yvon, Hinrich Schütze

The NLP community has mainly focused on scaling Large Language Models (LLMs) vertically, i. e., making them better for about 100 languages.

XLM-R

Paper
Code

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

1 code implementation • 18 Oct 2022 • Ayyoob Imani, Silvia Severini, Masoud Jalili Sabet, François Yvon, Hinrich Schütze

An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages.

Part-Of-Speech Tagging POS +3

Paper
Code

SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment

1 code implementation • 12 Oct 2022 • Abdullatif Köksal, Silvia Severini, Hinrich Schütze

Word alignments are essential for a variety of NLP tasks.

Machine Translation Translation +2

Paper
Code

Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • 31 May 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages

no code implementations • LREC 2022 • Silvia Severini, Ayyoob Imani, Philipp Dufter, Hinrich Schütze

Prior work on extracting MNE datasets from parallel corpora required resources such as large monolingual corpora or word aligners that are unavailable or perform poorly for underresourced languages.

Bilingual Lexicon Induction Transliteration

Paper
Add Code

CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

1 code implementation • 6 Apr 2021 • Ahmed Elnaggar, Wei Ding, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Silvia Severini, Florian Matthes, Burkhard Rost

Simultaneously, the transformer model, especially its combination with transfer learning, has been proven to be a powerful technique for natural language processing tasks.

Ranked #1 on Code Documentation Generation on CodeSearchNet - Java

API Sequence Recommendation Code Comment Generation +5

242

Paper
Code

Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction

no code implementations • COLING 2020 • Silvia Severini, Viktor Hangya, Alexander Fraser, Hinrich Sch{\"u}tze

In this paper, we enrich BWE-based BDI with transliteration information by using Bilingual Orthography Embeddings (BOEs).

Translation Transliteration +1

Paper
Add Code

LMU Bilingual Dictionary Induction System with Word Surface Similarity Scores for BUCC 2020

no code implementations • LREC 2020 • Silvia Severini, Viktor Hangya, Alex Fraser, er, Hinrich Sch{\"u}tze

We participate in both the open and closed tracks of the shared task and we show improved results of our method compared to simple vector similarity based approaches.

Machine Translation Translation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.