1 code implementation • Findings (ACL) 2021 • Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti
Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages.
no code implementations • 22 Jan 2020 • Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti
In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding.
Ranked #15 on Zero-Shot Cross-Modal Retrieval on COCO 2014
no code implementations • 14 Feb 2018 • Houdong Hu, Yan Wang, Linjun Yang, Pavel Komlev, Li Huang, Xi Chen, Jiapei Huang, Ye Wu, Meenaz Merchant, Arun Sacheti
In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing.