Search Results for author: Philipp Dufter

Found 23 papers, 10 papers with code

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

3 code implementations Findings of the Association for Computational Linguistics 2020 Masoud Jalili Sabet, Philipp Dufter, François Yvon, Hinrich Schütze

We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners, even with abundant parallel data; e. g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

Machine Translation Multilingual Word Embeddings +3

Analytical Methods for Interpretable Ultradense Word Embeddings

1 code implementation IJCNLP 2019 Philipp Dufter, Hinrich Schütze

In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose.

Word Embeddings

Identifying Necessary Elements for BERT's Multilinguality

1 code implementation1 May 2020 Philipp Dufter, Hinrich Schütze

We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual.

Static Embeddings as Efficient Knowledge Bases?

1 code implementation NAACL 2021 Philipp Dufter, Nora Kassner, Hinrich Schütze

Recent research investigates factual knowledge stored in large pretrained language models (PLMs).

Graph Algorithms for Multiparallel Word Alignment

1 code implementation EMNLP 2021 Ayyoob Imani, Masoud Jalili Sabet, Lütfi Kerem Şenel, Philipp Dufter, François Yvon, Hinrich Schütze

With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently.

Link Prediction Machine Translation +3

Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations

1 code implementation COLING 2020 Sheng Liang, Philipp Dufter, Hinrich Sch{\"u}tze

Pretrained language models (PLMs) learn stereotypes held by humans and reflected in text from their training corpora, including gender bias.

Language Modelling Sentence

Quantifying the Contextualization of Word Representations with Semantic Class Probing

no code implementations Findings of the Association for Computational Linguistics 2020 Mengjie Zhao, Philipp Dufter, Yadollah Yaghoobzadeh, Hinrich Schütze

Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well.

Identifying Elements Essential for BERT's Multilinguality

no code implementations EMNLP 2020 Philipp Dufter, Hinrich Sch{\"u}tze

We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual.

Subword Sampling for Low Resource Word Alignment

no code implementations21 Dec 2020 Ehsaneddin Asgari, Masoud Jalili Sabet, Philipp Dufter, Christopher Ringlstetter, Hinrich Schütze

This method's hypothesis is that the aggregation of different granularities of text for certain language pairs can help word-level alignment.

Bayesian Optimization Machine Translation +1

ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus

no code implementations ACL 2021 Ayyoob Imani, Masoud Jalili Sabet, Philipp Dufter, Michael Cysouw, Hinrich Schütze

With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective.

Multilingual NLP Transfer Learning

Wine is Not v i n. -- On the Compatibility of Tokenizations Across Languages

no code implementations13 Sep 2021 Antonis Maronikolakis, Philipp Dufter, Hinrich Schütze

The size of the vocabulary is a central design choice in large pretrained language models, with respect to both performance and memory requirements.

Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention

1 code implementation COLING 2020 Philipp Dufter, Martin Schmitt, Hinrich Sch{\"u}tze

Self-Attention Networks (SANs) are an integral part of successful neural architectures such as Transformer (Vaswani et al., 2017), and thus of pretrained language models such as BERT (Devlin et al., 2019) or GPT-3 (Brown et al., 2020).

Language Modelling Part-Of-Speech Tagging +1

Locating Language-Specific Information in Contextualized Embeddings

1 code implementation16 Sep 2021 Sheng Liang, Philipp Dufter, Hinrich Schütze

Multilingual pretrained language models (MPLMs) exhibit multilinguality and are well suited for transfer across languages.

BERT Cannot Align Characters

no code implementations EMNLP (insights) 2021 Antonis Maronikolakis, Philipp Dufter, Hinrich Schütze

We show that the closer two languages are, the better BERT can align them on the character level.

Wine is not v i n. On the Compatibility of Tokenizations across Languages

no code implementations Findings (EMNLP) 2021 Antonis Maronikolakis, Philipp Dufter, Hinrich Schütze

The size of the vocabulary is a central design choice in large pretrained language models, with respect to both performance and memory requirements.

Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages

no code implementations LREC 2022 Silvia Severini, Ayyoob Imani, Philipp Dufter, Hinrich Schütze

Prior work on extracting MNE datasets from parallel corpora required resources such as large monolingual corpora or word aligners that are unavailable or perform poorly for underresourced languages.

Bilingual Lexicon Induction Transliteration

Cannot find the paper you are looking for? You can Submit a new open access paper.