Search Results for author: Jon Vadillo

Found 6 papers, 2 papers with code

Uncertainty-Aware Explanations Through Probabilistic Self-Explainable Neural Networks

no code implementations20 Mar 2024 Jon Vadillo, Roberto Santana, Jose A. Lozano, Marta Kwiatkowska

The lack of transparency of Deep Neural Networks continues to be a limitation that severely undermines their reliability and usage in high-stakes applications.

valid

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

1 code implementation5 Jul 2021 Jon Vadillo, Roberto Santana, Jose A. Lozano

Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations.

BIG-bench Machine Learning Explainable Models

Analysis of Dominant Classes in Universal Adversarial Perturbations

no code implementations28 Dec 2020 Jon Vadillo, Roberto Santana, Jose A. Lozano

The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion.

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

1 code implementation14 Apr 2020 Jon Vadillo, Roberto Santana, Jose A. Lozano

Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious perturbations to natural inputs.

Adversarial Attack Emotion Classification

On the human evaluation of audio adversarial examples

no code implementations23 Jan 2020 Jon Vadillo, Roberto Santana

In this paper we investigate to which extent the distortion metrics proposed in the literature for audio adversarial examples, and which are commonly applied to evaluate the effectiveness of methods for generating these attacks, are a reliable measure of the human perception of the perturbations.

Universal adversarial examples in speech command classification

no code implementations22 Nov 2019 Jon Vadillo, Roberto Santana

Adversarial examples are inputs intentionally perturbed with the aim of forcing a machine learning model to produce a wrong prediction, while the changes are not easily detectable by a human.

Classification domain classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.