Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

26 Apr 2020Jesse VigSebastian GehrmannYonatan BelinkovSharon QianDaniel NevoYaron SingerStuart Shieber

Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.