Machine learning models often use spurious patterns such as "relying on the presence of a person to detect a tennis racket," which do not generalize.
While counterfactual examples are useful for analysis and training of NLP models, current generation methods either rely on manual labor to create very few counterfactuals, or only instantiate limited types of perturbations such as paraphrases or word substitutions.
However, prior studies observed improvements from explanations only when the AI, alone, outperformed both the human and the best team.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors.
Although current evaluation of question-answering systems treats predictions in isolation, we need to consider the relationship between predictions to measure true understanding.
Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions.
Complex machine learning models for NLP are often brittle, making different predictions for input instances that are extremely similar semantically.
Recent work in model-agnostic explanations of black-box machine learning has demonstrated that interpretability of complex models does not have to come at the cost of accuracy or model flexibility.
At the core of interpretable machine learning is the question of whether humans are able to make accurate predictions about a model's behavior.
Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces.
Despite widespread adoption, machine learning models remain mostly black boxes.