no code implementations • LREC 2022 • Kenneth Heafield, Elaine Farrow, Jelmer Van der Linde, Gema Ramírez-Sánchez, Dion Wiggins
We present the EuroPat corpus of patent-specific parallel data for 6 official European languages paired with English: German, Spanish, French, Croatian, Norwegian, and Polish.
2 code implementations • ACL 2020 • Marta Ba{\~n}{\'o}n, Pin-zhen Chen, Barry Haddow, Kenneth Heafield, Hieu Hoang, Miquel Espl{\`a}-Gomis, Mikel L. Forcada, Amir Kamran, Faheem Kirefu, Philipp Koehn, Sergio Ortiz Rojas, Leopoldo Pla Sempere, Gema Ram{\'\i}rez-S{\'a}nchez, Elsa Sarr{\'\i}as, Marek Strelec, Brian Thompson, William Waites, Dion Wiggins, Jaume Zaragoza
We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software.