Spectrogram-frame linear network and continuous frame sequence for bird sound classification
Inspired by that bird sound has various frequency distributions and continuous time-varying properties, a novel method is proposed for the classification of bird sound based on continuous frame sequence and spectrogram-frame linear network (SFLN). In order to form a continuous frame sequence as the standard input for SFLN, a sliding window algorithm of short frame length is suitable for differentiate the Mel-spectrogram of bird sound. The vertical 3D filter in the linear layer moves linearly along the continuous frame and cover its full frequency band. The weight is initialized to a Gaussian distribution to attenuate the high-and low-frequency noise, thereby extracting the long-and short-term features of the continuous frame of the bird sound. Finally, the GRU network is connected and used as a classifier to directly output the prediction results. Four kinds of bird sound from the xeno-canto website are tested to evaluate the influences of different parameters of sliding window on the effect of SFLN-based classification. In the comparison experiment, the mean average precision (MAP) achieves the highest value of 0.97.
PDF