We present a new paradigm named explore-and-match for video grounding, which aims to seamlessly unify two streams of video grounding methods: proposal-based and proposal-free.
In this work, we propose a novel framework that generates class representations by extracting features from class-relevant regions of the images.
Based on the transformation consistency, our method measures the difference between the transformed prototypes and a modified prototype set.
To this end, we combine learnable batch-instance normalization layers with meta-learning and investigate the challenging cases caused by both batch and instance normalization layers.
To learn target discriminative representations, using pseudo-labels is a simple yet effective approach for unsupervised domain adaptation.
We construct a structured domain adaptation framework for our learning paradigm and introduce a practical way of DD for implementation.
1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas
We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.