1 code implementation • 12 Jan 2025 • Ming Dai, Jian Li, Jiedong Zhuang, Xian Zhang, Wankou Yang
Furthermore, to address the challenge of insufficient multimodal understanding, we leverage pre-trained models based on visual-linguistic fusion representations.
no code implementations • 28 Dec 2024 • Jiedong Zhuang, Lu Lu, Ming Dai, Rui Hu, Jian Chen, Qiang Liu, Haoji Hu
In this paper, we conduct a comprehensive investigation of MLLM attention mechanisms with LLaVA.
1 code implementation • 26 Sep 2024 • Ming Dai, Lingfeng Yang, Yihao Xu, ZhenHua Feng, Wankou Yang
Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image.
Descriptive Generalized Referring Expression Comprehension +2
1 code implementation • 16 Aug 2024 • Jian Li, Weiheng Lu, Hao Fei, Meng Luo, Ming Dai, Min Xia, Yizhang Jin, Zhenye Gan, Ding Qi, Chaoyou Fu, Ying Tai, Wankou Yang, Yabiao Wang, Chengjie Wang
Multimodal Large Language Models (MLLMs) are gaining increasing popularity in both academia and industry due to their remarkable performance in various applications such as visual question answering, visual perception, understanding, and reasoning.
no code implementations • 10 Mar 2024 • Jiahao Chen, Enhui Zheng, Ming Dai, Yifu Chen, Yusheng Lu
Furthermore, its performance in meter-level localization accuracy is impressive, with 182. 62% improvement in 3-meter accuracy, 164. 17% improvement in 5-meter accuracy, and 137. 43% improvement in 10-meter accuracy.
1 code implementation • 13 Aug 2022 • Ming Dai, Enhui Zheng, Jiahao Chen, Lei Qi, ZhenHua Feng, Wankou Yang
However, IR-based methods face several challenges: 1) Pre- and post-processing incur significant computational and storage overhead; 2) The lack of interaction between dual-source features impairs precise spatial perception.
1 code implementation • 23 Jan 2022 • Ming Dai, Jianhong Hu, Jiedong Zhuang, Enhui Zheng
However it still has some limitations, e. g., it can only extract part of the information in the neighborhood and some scale reduction operations will make some fine-grained information lost.
Ranked #1 on Drone navigation on University-1652
1 code implementation • 23 Jan 2022 • Ming Dai, Enhui Zheng, ZhenHua Feng, Jiedong Zhuang, Wankou Yang
Last, we enhance the Recall@K metric and introduce a new measurement, SDM@K, to evaluate the performance of a trained model from both the retrieval and localization perspectives simultaneously.