Amharic - English Parallel Corpus for Machine Translation contains 33,955 sentence pairs extracted text from such news platforms as Ethiopian Press Agency1, Fana Broadcasting Corporate2, and Walta Information Center3. As the data we used is from different sources, it includes various domains such as religious (Bible and Quran), politics, economics, sports, news, among others.
1 PAPER • NO BENCHMARKS YET
WMT 2021 Ge'ez-Amharic is a Ge'ez-Amharic dataset prepared for NMT tasks of the 6th Workshop on NLP at Debre Berhan University, Ethiopia. The corpus has been collected from:
0 PAPER • NO BENCHMARKS YET