1 code implementation • 2 Apr 2024 • Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei
For task completion, the agent needs to align and integrate various navigation modalities, including instruction, observation and navigation history.
1 code implementation • 4 Oct 2023 • Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Xuanjing Huang, Zhongyu Wei
Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs).
no code implementations • 11 Jun 2022 • Zhihao Fan, Zhongyu Wei, Jingjing Chen, Siyuan Wang, Zejun Li, Jiarong Xu, Xuanjing Huang
These two steps are iteratively performed in our framework for continuous learning.
1 code implementation • 29 Jan 2022 • Zejun Li, Zhihao Fan, Huaixiao Tou, Jingjing Chen, Zhongyu Wei, Xuanjing Huang
In MVPTR, we follow the nested structure of both modalities to introduce concepts as high-level semantics.
1 code implementation • Findings (NAACL) 2022 • Zhihao Fan, Zhongyu Wei, Zejun Li, Siyuan Wang, Jianqing Fan
We propose our TAiloring neGative Sentences with Discrimination and Correction (TAGS-DC) to generate synthetic sentences automatically as negative samples.
no code implementations • 12 Sep 2021 • Zhihao Fan, Zhongyu Wei, Zejun Li, Siyuan Wang, Haijun Shan, Xuanjing Huang, Jianqing Fan
Existing research for image text retrieval mainly relies on sentence-level supervision to distinguish matched and mismatched sentences for a query image.
no code implementations • 21 Jun 2021 • Zhihao Fan, Zhongyu Wei, Siyuan Wang, Ruize Wang, Zejun Li, Haijun Shan, Xuanjing Huang
Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN.
no code implementations • 21 Mar 2021 • Zejun Li, Zhongyu Wei, Zhihao Fan, Haijun Shan, Xuanjing Huang
In this paper, we focus on the problem of unsupervised image-sentence matching.
no code implementations • 10 Oct 2017 • Chun Yang, Xu-Cheng Yin, Zejun Li, Jianwei Wu, Chunchao Guo, Hongfa Wang, Lei Xiao
Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks).