# Towards Robust Interpretability with Self-Explaining Neural Networks

David Alvarez-MelisTommi S. Jaakkola

Most recent work on interpretability of complex machine learning models has focused on estimating $\textit{a posteriori}$ explanations for previously trained models around specific predictions. $\textit{Self-explaining}$ models where interpretability plays a key role already during learning have received much less attention... (read more)

PDF Abstract