no code implementations • 26 Mar 2023 • Tenglong Ao, Zeyi Zhang, Libin Liu
We leverage the power of the large-scale Contrastive-Language-Image-Pre-training (CLIP) model and present a novel CLIP-guided mechanism that extracts efficient style representations from multiple input modalities, such as a piece of text, an example motion clip, or a video.