Self-supervised Video Retrieval
7 papers with code • 2 benchmarks • 2 datasets
With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.
Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.
As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning.
The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism.
It is convenient to treat PCL as a standard training strategy and apply it to many other works in self-supervised video feature learning.
However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension.
Instance-level contrastive learning techniques, which rely on data augmentation and a contrastive loss function, have found great success in the domain of visual representation learning.