We propose a novel method that tackles the problem of unsupervised domain adaptation for semantic segmentation by maximizing the cosine similarity between the source and the target domain at the feature level.
In the source domain, we fully train an object detector and the RRPN with full supervision of HOI.
Moreover, ablation studies validate that both methods of incorporating f-GCN for extracting knowledge from multi-modal contexts and our newly proposed self-supervised learning process are effective for TQA problems.
Using a subnetwork based on a precedent work of image completion, our model makes the shape of an object.
In this work, we introduce a new algorithm for analyzing a diagram, which contains visual and textual information in an abstract and integrated way.
Through quantitative and qualitative evaluation, we show that our method is effective for retrieval of video segments using natural language queries.