Search Results for author: Holger Schwenk

Found 43 papers, 21 papers with code

SONAR: Sentence-Level Multimodal and Language-Agnostic Representations

1 code implementation22 Aug 2023 Paul-Ambroise Duquenne, Holger Schwenk, Benoît Sagot

Our single text encoder, covering 200 languages, substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks.

Machine Translation Sentence Embedding +3

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

1 code implementation22 Jun 2023 Mingda Chen, Kevin Heffernan, Onur Çelebi, Alex Mourachko, Holger Schwenk

In comparison to xSIM, we show that xSIM++ is better correlated with the downstream BLEU scores of translation systems trained on mined bitexts, providing a reliable proxy of bitext mining performance without needing to run expensive bitext mining pipelines.


DiffEdit: Diffusion-based semantic image editing with mask guidance

3 code implementations20 Oct 2022 Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord

Semantic image editing is an extension of image generation, with the additional constraint that the generated image should be as similar as possible to a given input image.

Image Generation

Multilingual Representation Distillation with Contrastive Learning

no code implementations10 Oct 2022 Weiting Tan, Kevin Heffernan, Holger Schwenk, Philipp Koehn

Multilingual sentence representations from large models encode semantic information from two or more languages and can be used for different cross-lingual information retrieval and matching tasks.

Contrastive Learning Cross-Lingual Information Retrieval +1

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

1 code implementation25 May 2022 Kevin Heffernan, Onur Çelebi, Holger Schwenk

To achieve this, we focus on teacher-student training, allowing all encoders to be mutually compatible for bitext mining, and enabling fast learning of new languages.

Cross-Lingual Transfer NMT +1

FlexIT: Towards Flexible Semantic Image Translation

1 code implementation CVPR 2022 Guillaume Couairon, Asya Grechka, Jakob Verbeek, Holger Schwenk, Matthieu Cord

Via the latent space of an auto-encoder, we iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.

Image Generation Translation

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

1 code implementation NeurIPS 2021 Paul-Ambroise Duquenne, Hongyu Gong, Holger Schwenk

Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl.

Speech-to-Speech Translation Translation

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

no code implementations ACL (IWSLT) 2021 Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal

In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task.

Transfer Learning Translation

Beyond English-Centric Multilingual Machine Translation

7 code implementations21 Oct 2020 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation Translation

CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB

3 code implementations ACL 2021 Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin

To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets.

NMT Translation +1

MLQA: Evaluating Cross-lingual Extractive Question Answering

4 code implementations ACL 2020 Patrick Lewis, Barlas Oğuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk

An alternative to building large monolingual training datasets is to develop cross-lingual systems which can transfer to a target language without requiring training data in that language.

Extractive Question-Answering Machine Translation +1

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

6 code implementations EACL 2021 Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, Francisco Guzmán

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

Sentence Embeddings

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

13 code implementations TACL 2019 Mikel Artetxe, Holger Schwenk

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

Cross-Lingual Bitext Mining Cross-Lingual Document Classification +5

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

9 code implementations ACL 2019 Mikel Artetxe, Holger Schwenk

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.

Cross-Lingual Bitext Mining Machine Translation +4

Filtering and Mining Parallel Data in a Joint Multilingual Space

no code implementations ACL 2018 Holger Schwenk

The same approach is used to mine additional bitexts for the WMT'14 system and to obtain competitive results on the BUCC shared task to identify parallel sentences in comparable corpora.

Machine Translation Sentence Embedding +2

Learning Joint Multilingual Sentence Representations with Neural Machine Translation

1 code implementation WS 2017 Holger Schwenk, Matthijs Douze

In this paper, we use the framework of neural machine translation to learn joint sentence representations across six very different languages.

Joint Multilingual Sentence Representations Machine Translation +1

Very Deep Convolutional Networks for Text Classification

24 code implementations EACL 2017 Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun

The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks.

General Classification Text Classification

On Using Monolingual Corpora in Neural Machine Translation

no code implementations11 Mar 2015 Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.

Machine Translation Translation

Automatic Translation of Scientific Documents in the HAL Archive

no code implementations LREC 2012 Patrik Lambert, Holger Schwenk, Fr{\'e}d{\'e}ric Blain

This paper describes the development of a statistical machine translation system between French and English for scientific papers.

Domain Adaptation Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.