no code implementations • 26 Feb 2024 • Jerry Ngo, Yoon Kim
This probe is trained via a contrastive loss that pushes the language representations and sound representations of an object to be close to one another.
Object