1 code implementation • 4 Aug 2023 • Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia
Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e. g. achieving 76. 3% top-1 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data.