We introduce a novel ranking network that utilizes the Co-Attention between movies and trailers as guidance to generate the training pairs, where the moments highly corrected with trailers are expected to be scored higher than the uncorrelated moments.
Unsupervised domain adaptation (UDA) aims at inferring class labels for unlabeled target domain given a related labeled source dataset.
Recent developments in gradient-based attention modeling have seen attention maps emerge as a powerful tool for interpreting convolutional neural networks.
We propose an end-to-end network for the visual illustration of a sequence of sentences forming a story.
It remains open to explore duality theory and algorithms in such a non-convex and NP-hard problem setting.