Scene Recognition
64 papers with code • 8 benchmarks • 15 datasets
Benchmarks
These leaderboards are used to track progress in Scene Recognition
Most implemented papers
Omnivore: A Single Model for Many Visual Modalities
Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.
An Empirical Study of Remote Sensing Pretraining
To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks.
Object Detectors Emerge in Deep Scene CNNs
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.
Learning image representations tied to ego-motion
Understanding how images of objects and scenes behave in response to specific ego-motions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images.
Deep Filter Banks for Texture Recognition and Segmentation
Research in texture recognition often concentrates on the problem of material recognition in uncluttered conditions, an assumption rarely met by applications.
Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling
We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest.
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition.
Word Recognition with Deep Conditional Random Fields
On the other hand, word recognition is a sequential problem where we need to model the correlation between characters.
AGA: Attribute Guided Augmentation
We implement our approach as a deep encoder-decoder architecture that learns the synthesis function in an end-to-end manner.
Scene Recognition by Combining Local and Global Image Descriptors
In this work, we construct an end-to-end scene recognition pipeline consisting of feature extraction, encoding, pooling and classification.