Search Results for author: Kevin Heffernan

Found 11 papers, 6 papers with code

Paper
Add Code

Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and Dialect

no code implementations • LREC 2020 • Yo Sato, Kevin Heffernan

We present in this work a universal, character-based method for representing sentences so that one can thereby calculate the distance between any two sentence pair.

Clustering Sentence

Paper
Add Code

Homonym normalisation by word sense clustering: a case in Japanese

no code implementations • COLING 2020 • Yo Sato, Kevin Heffernan

This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing.

Clustering Language Modelling +1

Paper
Add Code

Unsupervised Topic Segmentation of Meetings with BERT Embeddings

2 code implementations • 24 Jun 2021 • Alessandro Solbiati, Kevin Heffernan, Georgios Damaskinos, Shivani Poddar, Shubham Modi, Jacques Cali

Topic segmentation of meetings is the task of dividing multi-person meeting transcripts into topic blocks.

Segmentation

176

Paper
Code

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

1 code implementation • 25 May 2022 • Kevin Heffernan, Onur Çelebi, Holger Schwenk

To achieve this, we focus on teacher-student training, allowing all encoders to be mutually compatible for bitext mining, and enabling fast learning of new languages.

Cross-Lingual Transfer NMT +2

3,518

Paper
Code

No Language Left Behind: Scaling Human-Centered Machine Translation

7 code implementations • Meta AI 2022 • NLLB team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Jeff Wang

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today.

Ranked #1 on Machine Translation on IWSLT2017 French-English (SacreBLEU metric)

Machine Translation Translation

29,185

Paper
Code

Multilingual Representation Distillation with Contrastive Learning

no code implementations • 10 Oct 2022 • Weiting Tan, Kevin Heffernan, Holger Schwenk, Philipp Koehn

Multilingual sentence representations from large models encode semantic information from two or more languages and can be used for different cross-lingual information retrieval and matching tasks.

Contrastive Learning Cross-Lingual Information Retrieval +2

Paper
Add Code

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

1 code implementation • 22 Jun 2023 • Mingda Chen, Kevin Heffernan, Onur Çelebi, Alex Mourachko, Holger Schwenk

In comparison to xSIM, we show that xSIM++ is better correlated with the downstream BLEU scores of translation systems trained on mined bitexts, providing a reliable proxy of bitext mining performance without needing to run expensive bitext mining pipelines.

NMT

3,518

Paper
Code

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

2 code implementations • 22 Aug 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?

Automatic Speech Recognition Speech-to-Speech Translation +3

10,144

Paper
Code

Seamless: Multilingual Expressive and Streaming Speech Translation

1 code implementation • 8 Dec 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia Gonzalez, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-jussà, Maha Elbayad, Hongyu Gong, Francisco Guzmán, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson

In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.

Multimodal Machine Translation Translation

10,144

Paper
Code

Problem-solving Recognition in Scientific Text

no code implementations • LREC 2022 • Kevin Heffernan, Simone Teufel

As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.