WebLI (Web Language Image)

WebLI (Web Language Image) is a web-scale multilingual image-text dataset, designed to support Google’s vision-language research, such as the large-scale pre-training for image understanding, image captioning, visual question answering, object detection etc.

The dataset is built from the public web, including image bytes, image-associated texts (alt-text, OCR, page title), 109 languages and many other features. The dataset is deduplicated on 68 common vision/vision-language tasks, and has no user or personally identifiable data with careful RAI considerations.

Source: PaLI: A Jointly-Scaled Multilingual Language-Image Model


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown