…localized narratives (synchronized voice, mouse trace, and text caption) 66,391,027 point-level annotations on 5,827 classes 61,404,966 image-level labels on 20,638 classes Images are under a CC BY 2.0 license , annotations under CC BY 4.0 license.
4 PAPERS • NO BENCHMARKS YET