Temporal action detection (TAD) aims to determine the semantic label and the boundaries of every action instance in an untrimmed video.
Our localization method combines neural network-based segmentation and classical techniques, and we are able to consistently locate our needle with 0. 73 mm RMS error in clean environments and 2. 72 mm RMS error in challenging environments with blood and occlusion.
Current developments in temporal event or action localization usually target actions captured by a single camera.
Ranked #2 on Temporal Action Localization on THUMOS’14 (using extra training data)
Given the insight that pixels belonging to one instance have one or more common attributes of current instance, we bring up an one-stage instance segmentation network named Common Attribute Support Network (CASNet), which realizes instance segmentation by predicting and clustering common attributes.