With this loss function, the samples from the same audio scene are clustered independently of the environment, and thus we can get the classifier with better generalization ability in an unseen environment.
Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns.
One side effect of restricting the RF of CNNs is that more frequency information is lost.
Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning.
To this end, we analyse the receptive field (RF) of these CNNs and demonstrate the importance of the RF to the generalization capability of the models.
In the presented experiments the best audio representation is the Log-Mel spectrogram of the harmonic andpercussive sources plus the Log-Mel spectrogram of the difference between left and right stereo-channels.
Acoustic scene classification is the task of identifying the scene from which the audio signal is recorded.
Spectrograms have been widely used in Convolutional Neural Networks based schemes for acoustic scene classification, such as the STFT spectrogram and the MFCC spectrogram, etc.