Itihasa is a large-scale corpus for Sanskrit to English translation containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata.
2 PAPERS • 1 BENCHMARK
The GATITOS (Google's Additional Translations Into Tail-languages: Often Short) dataset is a high-quality, multi-way parallel dataset of tokens and short phrases, intended for training and improving machine translation models. This dataset consists in 4,000 English segments (4,500 tokens) that have been translated into each of 26 low-resource languages, as well as three higher-resource pivot languages (es, fr, hi). All translations were made directly from English, with the exception of Aymara, which was translated from the Spanish.
1 PAPER • NO BENCHMARKS YET