ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase. This allows training video description models with this data, and importantly, evaluate how grounded or "true" such model are to the video they describe.
Source: https://github.com/facebookresearch/ActivityNet-EntitiesPaper | Code | Results | Date | Stars |
---|