Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities.
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting.
Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks.
The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge.
Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems.
In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute referenced in the example were different?