Search Results for author: Kevin Heffernan

Found 11 papers, 6 papers with code

Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and Dialect

no code implementations LREC 2020 Yo Sato, Kevin Heffernan

We present in this work a universal, character-based method for representing sentences so that one can thereby calculate the distance between any two sentence pair.

Clustering Sentence

Homonym normalisation by word sense clustering: a case in Japanese

no code implementations COLING 2020 Yo Sato, Kevin Heffernan

This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing.

Clustering Language Modelling +1

Unsupervised Topic Segmentation of Meetings with BERT Embeddings

2 code implementations24 Jun 2021 Alessandro Solbiati, Kevin Heffernan, Georgios Damaskinos, Shivani Poddar, Shubham Modi, Jacques Cali

Topic segmentation of meetings is the task of dividing multi-person meeting transcripts into topic blocks.

Segmentation

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

1 code implementation25 May 2022 Kevin Heffernan, Onur Çelebi, Holger Schwenk

To achieve this, we focus on teacher-student training, allowing all encoders to be mutually compatible for bitext mining, and enabling fast learning of new languages.

Cross-Lingual Transfer NMT +2

Multilingual Representation Distillation with Contrastive Learning

no code implementations10 Oct 2022 Weiting Tan, Kevin Heffernan, Holger Schwenk, Philipp Koehn

Multilingual sentence representations from large models encode semantic information from two or more languages and can be used for different cross-lingual information retrieval and matching tasks.

Contrastive Learning Cross-Lingual Information Retrieval +2

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

1 code implementation22 Jun 2023 Mingda Chen, Kevin Heffernan, Onur Çelebi, Alex Mourachko, Holger Schwenk

In comparison to xSIM, we show that xSIM++ is better correlated with the downstream BLEU scores of translation systems trained on mined bitexts, providing a reliable proxy of bitext mining performance without needing to run expensive bitext mining pipelines.

NMT

Problem-solving Recognition in Scientific Text

no code implementations LREC 2022 Kevin Heffernan, Simone Teufel

As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method.

Cannot find the paper you are looking for? You can Submit a new open access paper.