Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2.
In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine.
We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.
Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption.
A general problem in acoustic scene classification task is the mismatched conditions between training and testing data, which significantly reduces the performance of the developed methods on classification accuracy.
While the successful estimation of a photo's geolocation enables a number of interesting applications, it is also a very challenging task.
Several approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views.
The goal of AID is to advance the state-of-the-arts in scene classification of remote sensing images.