1 code implementation • 29 Jan 2024 • Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang
Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.
Ranked #1 on Zero-shot Image Retrieval on Flickr30k-CN (using extra training data)
Zero-Shot Cross-Modal Retrieval Zero-shot Image Retrieval +3
no code implementations • 3 Aug 2023 • Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Wang
Our BGA-MNER consists of \texttt{image2text} and \texttt{text2image} generation with respect to entity-salient content in two modalities.
no code implementations • CVPR 2022 • Weixiang Hong, Jiangwei Lao, Wang Ren, Jian Wang, Jingdong Chen, Wei Chu
Instead of proposing a specific vision transformer based detector, in this work, our goal is to reveal the insights of training vision transformer based detectors from scratch.