[Re] Reproducing 'Identifying through flows for recovering latent representations'
Scope of Reproducibility
The authors claim to introduce a model for recovering the latent representation of observed data, that outperforms the state-of-the-art method for this task; namely the iVAE model. They claim that iFlow outperforms iVAE both in preservation of the original geometry of source manifold and correlation per dimension of the latent space.
To reproduce the results of the paper, the main experiments are reproduced and the figures are recreated. To do so, we largely worked with code from the repository belonging to the original paper. We added plotting functionality as well as various bug fixes and optimisations. Additionally, attempts were made to improve the iVAE by making it more complex and fixing a mistake in its implementation.We also tried to investigate possible correlation between the structure of the dataset and the performance. All code used is publicly available at https://github.com/HiddeLekanne/Reproducibility-Challenge-iFlow.
The obtained mean and standard deviation of the MCC over 100 seeds are within 1 percent of the results reported in the paper. The iFlow model obtained a mean MCC score of 0.718 (0.067). Efforts to improve and correct the baseline increased the mean MCC score from 0.483 (0.059) to 0.556 (0.061). The performance, however, remains worse than the performance of iFlow, further supporting the authorsʼ claim that the iFlow implementation is correct and more effective than iVAE.
What was easy
The GitHub repository associated with the paper provided most necessary code and ran with only minor changes. The code included all model implementations and data generation. The script that was used to obtain results was provided, which allowed us to determine which exact hyperparameters were used with experiments on the iFlow models. Overall, the code was well organised and the structure was easy to follow.
What was difficult
The specific versions of the Python libraries used were unknown, which made it infeasible to achieve the exact results from the paper when running on the same seeds. The code used to create figures 1-3 in the original paper was missing and had to be recreated. Furthermore, the long time the models needed to train made experimentation with e.g., different hyperparameters challenging. Finally, the code was largely undocumented.
Communication with original authors
Communication with the authors was attempted but could not be established.