TLDR: Text Based Last-layer Retraining for Debiasing Image Classifiers
An image classifier may depend on incidental features stemming from a strong correlation between the feature and the classification target in the training dataset. Recently, Last Layer Retraining (LLR) with group-balanced datasets is shown to be efficient in mitigating the spurious correlation of classifiers. However, the acquisition of image-based group-balanced datasets is costly, which hinders the general applicability of the LLR method. In this work, we propose to perform LLR based on text datasets built with large language models to debias a general image classifier. To that end, we demonstrate that text can generally be a proxy for its corresponding image beyond the image-text joint embedding space, which is achieved with a linear projector that ensures orthogonality between its weight and the modality gap of the joint embedding space. In addition, we propose a systematic validation procedure that checks whether the generated words are compatible with the embedding space of CLIP and the image classifier, which is shown to be effective for improving debiasing performance. We dub these procedures as TLDR (Text-based Last layer retraining for Debiasing image classifieRs) and show our method achieves the performance that is competitive with the LLR methods that require group-balanced image dataset for retraining. Furthermore, TLDR outperforms other baselines that involve training the last layer without any group annotated dataset. Codes: https://github.com/beotborry/TLDR
PDF Abstract