Second, each support sample is used as a class code to leverage the information by comparing similarities between each support feature and query features.
Meanwhile, with the advent of Generative Adversarial Networks (GANs), there has been great progress in reconstructing realistic 2D images.
However, it has an expensive computational cost and requires two-stream (RGB and optical flow) framework.
Ranked #4 on Action Recognition on Jester
Through quantitative and qualitative evaluation, we show that our method is effective for retrieval of video segments using natural language queries.