[Re] Replication Study of 'Generative causal explanations of black-box classifiers'

RC 2020  ·  Ivo Verhoeven, Xinyi Chen, Qingzhi Hu, Mario Holubar ·

Scope of Reproducibility
We verify the outcome of the methodology proposed in the article, which attempts to provide post-hoc causal explanations for black-box classifiers through causal reference. This is achieved by replicating the code step by step, according to the descriptions in the paper. All the claims in the paper have been examined, and we provide additional metric to evaluate the portability, expressive power, algorithmic complexity and the data fidelity of their framework. We have further extended their analyses to consider all benchmark datasets used, confirming results.

Methodology
We use the same architecture and (hyper)parameters for replication. However, the code has a different structure and we provide a more efficient implementation for the measure of information flow. In addition, Algorithm 1 in the original paper is not implemented in their repository, so we have also implemented Algorithm 1 ourselves and further extend their framework to another domain (text data), although unsuccessfully. Furthermore, we make a detailed table in the paper to show the time used to produce the results for different experiments reproduced. All models were trained on Nvidia GeForce GTX 1080 GPUs provided by Surfsaraʼs LISA cluster computing service at university of Amsterdam1.

Results
We reproduced the framework in the original paper and verified the main claims made by the authors in the original paper. However, the GCE model in extension study did not manage to separate causal factors and non-causal factors for a text classifier due to the complexity of fine-tuning the model.

What was easy
The original paper comes with extensive appendices, many of which contain crucial details for implementation and understanding of the intended function. The authors provide code for most of the experiments presented in the paper. Although at the beginning their code repository was not functional, we use it as a reference to re-implement our code. The author also updated their code two weeks after we start our own implementation, which made it easy for us to verify the correctness of our re-implementation.

What was difficult
The codebase the authors provided was initially unusable, with missing or renamed imports, hardcoded filepaths and an all-around convoluted structure. Additionally, the description of Algorithm 1 is quite vague and no implementation of it was given. Beyond this, computational expense was a serious issue, given the need for inefficient training steps, and re-iterating training several times for hyperparameter search.

Communication with original authors
This reproducibility study is part of a course on fairness, accountability, confidentiality and transparency in AI. Since it is a course project where we interacted with other group in the forum, and another group also working with this paper has reached out to the authors about problems with the initial repository, we did not find necessary to do it again.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here