A large-scale Japanese video caption dataset consisting of 79,822 videos and 399,233 captions. Each caption in the dataset describes a video in the form of "who does what and where."
Source: Video Caption Dataset for Describing Human Actions in JapanesePaper | Code | Results | Date | Stars |
---|