1 code implementation • 31 Dec 2024 • Ling Fu, Biao Yang, Zhebin Kuang, Jiajun Song, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Mingxin Huang, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai
Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently.
1 code implementation • 23 Jul 2024 • Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie
Our work delineates the viability of an integrated approach to multimodal generation within the visual text domain, setting a foundation for subsequent inquiries.
1 code implementation • 2 Jul 2024 • Jinghui Lu, Haiyang Yu, Yanjie Wang, YongJie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao liu, Can Huang
Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks.
1 code implementation • 3 Jun 2024 • Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, YongJie Ye, Hao liu, Wengang Zhou, Houqiang Li, Can Huang
In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts.
no code implementations • 19 Apr 2024 • Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang
Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.
1 code implementation • CVPR 2024 • Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Can Huang, Hao liu, Xin Tan, Zhizhong Zhang, Yuan Xie
A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios.
no code implementations • 31 May 2022 • Wenshuo Zhou, Dalu Yang, Binghong Wu, Yehui Yang, Junde Wu, Xiaorong Wang, Lei Wang, Haifeng Huang, Yanwu Xu
Deep learning based medical imaging classification models usually suffer from the domain shift problem, where the classification performance drops when training data and real-world data differ in imaging equipment manufacturer, image acquisition protocol, patient populations, etc.
1 code implementation • 15 Sep 2021 • Binghong Wu, Yehui Yang, Dalu Yang, Junde Wu, Xiaorong Wang, Haifeng Huang, Lei Wang, Yanwu Xu
Based on focal loss with ATSS-R50, our approach achieves 40. 5 AP, surpassing the state-of-the-art QFL (Quality Focal Loss, 39. 9 AP) and VFL (Varifocal Loss, 40. 1 AP).
1 code implementation • 3 Aug 2020 • Yehui Yang, Fangxin Shang, Binghong Wu, Dalu Yang, Lei Wang, Yanwu Xu, Wensheng Zhang, Tianzhu Zhang
As a result, it exploits more discriminative features for DR grading.
no code implementations • 31 Jul 2020 • Dalu Yang, Yehui Yang, Tiantian Huang, Binghong Wu, Lei Wang, Yanwu Xu
How can we train a classification model on labeled fundus images ac-quired from only one camera brand, yet still achieves good performance on im-ages taken by other brands of cameras?