As deep neural networks become widely adopted for solving most problems in computer vision and audio-understanding, there are rising concerns about their potential vulnerability.
This layer aims at being the input layer of convolutional neural networks for audio applications.
Formal verification techniques that compute provable guarantees on properties of machine learning models, like robustness to norm-bounded adversarial perturbations, have yielded impressive results.
Saving rainforests is a key to halting adverse climate changes.
Detecting sleepiness from spoken language is an ambitious task, which is addressed by the Interspeech 2019 Computational Paralinguistics Challenge (ComParE).
The purpose of this paper is to analyze and discuss the performance of several residual block implementations within a state-of-the-art CNN-based architecture for end-to-end audio classification using raw audio waveforms.
We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings.
In this paper we first review some strong adversarial attacks that may affect both audio signals and their 2D representations and evaluate the resiliency of the most common machine learning model, namely deep learning models and support vector machines (SVM) trained on 2D audio representations such as short time Fourier transform (STFT), discrete wavelet transform (DWT) and cross recurrent plot (CRP) against several state-of-the-art adversarial attacks.