Spectrogram-frame linear network and continuous frame sequence for bird sound classification

Inspired by that bird sound has various frequency distributions and continuous time-varying properties, a novel method is proposed for the classification of bird sound based on continuous frame sequence and spectrogram-frame linear network (SFLN). In order to form a continuous frame sequence as the standard input for SFLN, a sliding window algorithm of short frame length is suitable for differentiate the Mel-spectrogram of bird sound. The vertical 3D filter in the linear layer moves linearly along the continuous frame and cover its full frequency band. The weight is initialized to a Gaussian distribution to attenuate the high-and low-frequency noise, thereby extracting the long-and short-term features of the continuous frame of the bird sound. Finally, the GRU network is connected and used as a classifier to directly output the prediction results. Four kinds of bird sound from the xeno-canto website are tested to evaluate the influences of different parameters of sliding window on the effect of SFLN-based classification. In the comparison experiment, the mean average precision (MAP) achieves the highest value of 0.97.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here