Local Interpretations for Explainable Natural Language Processing: A Survey

20 Mar 2021  ·  Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon ·

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for natural language processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term \textit{interpretability} and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are divided into three categories: 1) explaining the model's predictions through related input features; 2) explaining through natural language explanation; 3) probing the hidden states of models and word representations.

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here