Dataset Card for "tamil-alpaca"
1 PAPER • NO BENCHMARKS YET
Dataset Card for "tamil-alpaca" This repository includes a Tamil-translated versions of the Alpaca dataset and a subset of OpenOrca dataset.
We provide a new data set XWikiRef for the task of Cross-lingual Multi-document Summarization. This task aims at generating Wikipedia style text in Low Resource languages by taking reference text as input. Overall, the data set contains 8 different languages: bengali (bn), english (en), hindi (hi), marathi (mr), malayalam (ml), odia (or), punjabi (pa) and tamil (ta). It also contains 5 domains: books, films, politicians, sportsman and writers.
1 PAPER • 1 BENCHMARK