The CASIA-WebFace dataset is used for face verification and face identification tasks. The dataset contains 494,414 face images of 10,575 real identities collected from the web.
381 PAPERS • 2 BENCHMARKS
Despite recent advances in vision-and-language tasks, most progress is still focused on resource-rich languages such as English. Furthermore, widespread vision-and-language datasets directly adopt images representative of American or European cultures resulting in bias. Hence we introduce ParsVQA-Caps, the first benchmark in Persian for Visual Question Answering and Image Captioning tasks. We utilize two ways to collect datasets for each task, human-based and template-based for VQA and human-based and web-based for image captioning. The image captioning dataset consists of over 7.5k images and about 9k captions. The VQA dataset consists of almost 11k images and 28.5k question and answer pairs with short and long answers usable for both classification and generation VQA.
1 PAPER • NO BENCHMARKS YET
Persian Font Recognition (PFR)
1 PAPER • 1 BENCHMARK
Persian Text Image Segmentation (PTI SEG)