no code implementations • 16 May 2024 • Albert Yu, Adeline Foote, Raymond Mooney, Roberto Martín-Martín
We demonstrate that training the image encoder to predict the language description or the distance between descriptions of a sim or real image serves as a useful, data-efficient pretraining step that helps learn a domain-invariant image representation.