HACS is a dataset for human action recognition. It uses a taxonomy of 200 action classes, which is identical to that of the ActivityNet-v1.3 dataset. It has 504K videos retrieved from YouTube. Each one is strictly shorter than 4 minutes, and the average length is 2.6 minutes. A total of 1.5M clips of 2-second duration are sparsely sampled by methods based on both uniform randomness and consensus/disagreement of image classifiers. 0.6M and 0.9M clips are annotated as positive and negative samples, respectively.
Authors split the collection into training, validation and testing sets of size 1.4M, 50K and 50K clips, which are sampled from 492K, 6K and 6K videos, respectively.