Finally, we show that the proposed model can generalize to out-of-domain data without significant loss in performance without any finetuning for both the recognition and localization tasks.
This work introduces a novel solution to measure economic activity through remote sensing for a wide range of spatial areas.
We present a self-supervised perceptual prediction framework capable of temporal event segmentation by building stable representations of objects over time and demonstrate it on long videos, spanning several days.
We also show results on real examples of different sites before and after the COVID-19 outbreak to illustrate different measurable indicators.
Complex analyses involving multiple, dependent random quantities often lead to graphical models - a set of nodes denoting variables of interest, and corresponding edges denoting statistical interactions between nodes.
We find that large amounts of training data are necessary, both for pre-training as well as fine-tuning to a task, for the models to perform well on the designated task.
no code implementations • 11 Mar 2019 • Žiga Emeršič, Aruna Kumar S. V., B. S. Harish, Weronika Gutfeter, Jalil Nourmohammadi Khiarak, Andrzej Pacut, Earnest Hansley, Mauricio Pamplona Segundo, Sudeep Sarkar, Hyeonjung Park, Gi Pyo Nam, Ig-Jae Kim, Sagar G. Sangodkar, Ümit Kaçar, Murvet Kirci, Li Yuan, Jishou Yuan, Haonan Zhao, Fei Lu, Junying Mao, Xiaoshuang Zhang, Dogucan Yaman, Fevziye Irem Eyiokur, Kadir Bulut Özler, Hazim Kemal Ekenel, Debbrota Paul Chowdhury, Sambit Bakshi, Pankaj K. Sa, Banshidhar Majhi, Peter Peer, Vitomir Štruc
The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i. e. gender and ethnicity.
We also show that the proposed approach is able to learn highly discriminative features that help improve action recognition when used in a representation learning paradigm.
We describe a strategy for detection and classification of man-made objects in large high-resolution satellite photos under computational resource constraints.
We address the problem of suppressing facial expressions in videos because expressions can hinder the retrieval of important information in applications such as face recognition.
With this framework, we were able to reduce the training time while maintaining the classification performance of the ensemble.
We used the results generated to perform a geometric image normalization that boosted the performance of all evaluated descriptors.
Through extensive experiments, we show that the use of commonsense knowledge from ConceptNet allows the proposed approach to handle various challenges such as training data imbalance, weak features, and complex semantic relationships and visual scenes.
We apply this framework to the problem of speech recognition using both audio and visual components.