Provably Robust Transfer
Knowledge transfer is an effective tool for learning, especially when labeled data is scarce or when training from scratch is prohibitively costly. The overwhelming majority of transfer learning literature is focused on obtaining accurate models, neglecting the issue of adversarial robustness. Yet, robustness is essential, particularly when transferring to safety-critical domains. We analyze and improve the robustness of a popular transfer learning framework consisting of two parts: a feature extractor and a classifier which is re-trained on the target domain. Our experiments show how adversarial training on the source domain affects robustness on source and target domain, and we propose the first provably robust transfer learning models. We obtain strong robustness guarantees by bounding the worst-case change in the extracted features while controlling the Lipschitz constant of the classifier. Our models maintain high accuracy while significantly improving provable robustness.
PDF Abstract