no code implementations • 25 May 2023 • Sitian Shen, Zilin Zhu, Linqian Fan, Harry Zhang, Xinxiao wu
Large pre-trained models have had a significant impact on computer vision by enabling multi-modal learning, where the CLIP model has achieved impressive results in image classification, object detection, and semantic segmentation.