Search Results for author: Masoud Jalili Sabet

Found 16 papers, 6 papers with code

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

3 code implementations • Findings of the Association for Computational Linguistics 2020 • Masoud Jalili Sabet, Philipp Dufter, François Yvon, Hinrich Schütze

We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners, even with abundant parallel data; e. g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

Machine Translation Multilingual Word Embeddings +3

339

Paper
Code

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages

1 code implementation • 20 May 2023 • Ayyoob Imani, Peiqin Lin, Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, Nora Kassner, Chunlan Ma, Helmut Schmid, André F. T. Martins, François Yvon, Hinrich Schütze

The NLP community has mainly focused on scaling Large Language Models (LLMs) vertically, i. e., making them better for about 100 languages.

XLM-R

Paper
Code

TMop: a Tool for Unsupervised Translation Memory Cleaning

1 code implementation • ACL 2016 • Masoud Jalili Sabet, Matteo Negri, Marco Turchi, Jos{\'e} G. C. de Souza, Marcello Federico

Machine Translation Translation

Paper
Code

Graph Algorithms for Multiparallel Word Alignment

1 code implementation • EMNLP 2021 • Ayyoob Imani, Masoud Jalili Sabet, Lütfi Kerem Şenel, Philipp Dufter, François Yvon, Hinrich Schütze

With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently.

Link Prediction Machine Translation +3

Paper
Code

CaMEL: Case Marker Extraction without Labels

1 code implementation • ACL 2022 • Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich Schütze

We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages.

Paper
Code

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

1 code implementation • 18 Oct 2022 • Ayyoob Imani, Silvia Severini, Masoud Jalili Sabet, François Yvon, Hinrich Schütze

An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages.

Part-Of-Speech Tagging POS +3

Paper
Code

Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

no code implementations • 31 Oct 2018 • Nina Poerner, Masoud Jalili Sabet, Benjamin Roth, Hinrich Schütze

Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora.

Cross-Lingual Word Embeddings Word Alignment +1

Paper
Add Code

An Unsupervised Method for Automatic Translation Memory Cleaning

no code implementations • ACL 2016 • Masoud Jalili Sabet, Matteo Negri, Marco Turchi, Eduard Barbu

Machine Translation Translation

Paper
Add Code

Learning to Weight Translations using Ordinal Linear Regression and Query-generated Training Data for Ad-hoc Retrieval with Long Queries

no code implementations • COLING 2016 • Javid Dadashkarimi, Masoud Jalili Sabet, Azadeh Shakery

To this end, first we build a query-generated training data using pseudo-relevant documents to the query and all translation candidates.

Document Ranking Information Retrieval +4

Paper
Add Code

Improving Word Alignment of Rare Words with Word Embeddings

no code implementations • COLING 2016 • Masoud Jalili Sabet, Heshaam Faili, Gholamreza Haffari

We address the problem of inducing word alignment for language pairs by developing an unsupervised model with the capability of getting applied to other generative alignment models.

Machine Translation Sentence +2

Paper
Add Code

Subword Sampling for Low Resource Word Alignment

no code implementations • 21 Dec 2020 • Ehsaneddin Asgari, Masoud Jalili Sabet, Philipp Dufter, Christopher Ringlstetter, Hinrich Schütze

This method's hypothesis is that the aggregation of different granularities of text for certain language pairs can help word-level alignment.

Bayesian Optimization Machine Translation +1

Paper
Add Code

ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus

no code implementations • ACL 2021 • Ayyoob Imani, Masoud Jalili Sabet, Philipp Dufter, Michael Cysouw, Hinrich Schütze

With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective.

Multilingual NLP Transfer Learning

Paper
Add Code

Graph Neural Networks for Multiparallel Word Alignment

no code implementations • Findings (ACL) 2022 • Ayyoob Imani, Lütfi Kerem Şenel, Masoud Jalili Sabet, François Yvon, Hinrich Schütze

First, we create a multiparallel word alignment graph, joining all bilingual word alignment pairs in one graph.

Community Detection Machine Translation +2

Paper
Add Code

Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • 31 May 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Don’t Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • LREC (BUCC) 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

LICD: A Language-Independent Approach for Aspect Category Detection

no code implementations • ECIR 2019 • Erfan Ghadery, Sajad Movahedi, Masoud Jalili Sabet, Heshaam Faili, Azadeh Shakery

For a given sentence, our proposed method performs ACD based on two hypotheses: First, a category should be assigned to a sentence if there is a high semantic similarity between the sentence and a set of representative words of that category.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +6

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.