1 code implementation • 5 Jun 2023 • Félix Gaschi, Patricio Cerda, Parisa Rastin, Yannick Toussaint
Namely, we find that realignment works better on tasks for which alignment is correlated with cross-lingual transfer when generalizing to a distant language and with smaller models, as well as when using a bilingual dictionary rather than FastAlign to extract realignment pairs.
1 code implementation • 3 Jul 2019 • Patricio Cerda, Gaël Varoquaux
We introduce two encoding approaches for string categories: a Gamma-Poisson matrix factorization on substring counts, and the min-hash encoder, for fast approximation of string similarities.
2 code implementations • 4 Jun 2018 • Patricio Cerda, Gaël Varoquaux, Balázs Kégl
We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains.