Recent works have focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the compressed model performance for downstream tasks.
This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder.
However, the attribution problem has not been well-defined, which lacks a unified guideline to the contribution assignment process.
Motivated by the success of disentangled representation learning in computer vision, we study the possibility of learning semantic-rich time-series representations, which remains unexplored due to three main challenges: 1) sequential data structure introduces complex temporal correlations and makes the latent representations hard to interpret, 2) sequential models suffer from KL vanishing problem, and 3) interpretable semantic concepts for time-series often rely on multiple factors instead of individuals.
The basic idea is to learn a source signal by back-propagation such that the mutual information between input and output should be as much as possible preserved in the mutual information between input and the source signal.
These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample.
With the wide use of deep neural networks (DNN), model interpretability has become a critical concern, since explainable decisions are preferred in high-stake scenarios.
In this paper, we introduce DSN (Deep Serial Number), a new watermarking approach that can prevent the stolen model from being deployed by unauthorized parties.
Attribution methods have been developed to understand the decision-making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features.
Combating fake news and misinformation propagation is a challenging task in the post-truth era.
Image captioning has made substantial progress with huge supporting image collections sourced from the web.
In this paper, we investigate a specific security problem called trojan attack, which aims to attack deployed DNN systems relying on the hidden trigger patterns inserted by malicious hackers.
In this paper, we review recent work on adversarial attacks and defenses, particularly from the perspective of machine learning interpretation.
Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions.
Neural architecture search (NAS) is gaining more and more attention in recent years due to its flexibility and remarkable capability to reduce the burden of neural network design.
Existing local explanation methods provide an explanation for each decision of black-box classifiers, in the form of relevance scores of features according to their contributions.
The analysis further shows that LAE outperforms the state-of-the-arts by 6. 52%, 12. 03%, and 3. 08% respectively on three deepfake detection tasks in terms of generalization accuracy on previously unseen manipulations.
Recent explainability related studies have shown that state-of-the-art DNNs do not always adopt correct evidences to make decisions.
To this end, we propose a novel deep structured anomaly detection framework to identify the cross-modal anomalies embedded in the data.
SpecAE leverages Laplacian sharpening to amplify the distances between representations of anomalies and the ones of the majority.
Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how machine learning systems work and further enhance their trust towards systems.
In this demo paper, we present the XFake system, an explainable fake news detector that assists end-users to identify news credibility.
REAT decomposes the final prediction of a RNN into additive contribution of each word in the input text.
Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision.
While deep neural networks (DNN) have become an effective computational tool, the prediction results are often criticized by the lack of interpretability, which is essential in many real-world applications such as health informatics.