no code implementations • 5 Oct 2022 • Dhrupad Bhardwaj, Julia Kempe, Artem Vysogorets, Angela M. Teng, Evaristus C. Ezekwem
Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a randomly initialized backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on unseen tasks.