HaVG (Hausa Visual Genome Dataset)

Introduced by Abdulmumin et al. in Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation

A dataset that contains the description of an image or a section within the image in Hausa and its equivalent in English. Hausa, a Chadic language, is a member of the Afro-Asiatic language family. It is estimated that about 100 to 150 million people speak the language, with more than 80 million indigenous speakers. The dataset comprises 32,923 images and their descriptions that are divided into training, development, test, and challenge test set. The Hausa Visual Genome is the first dataset of its kind and can be used for Hausa-English machine translation, multi-modal research, and image description, among various other natural language processing and generation tasks.


