no code implementations • 22 Apr 2024 • Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu Man Zhang, Qing Li, XuCheng Yin, Zhaoxiang Zhang, Junran Peng
Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance.
1 code implementation • 13 May 2023 • Yuliang Liu, Zhang Li, Biao Yang, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai
In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER).
no code implementations • 5 Sep 2022 • Lei Chen, Haibo Qin, Shi-Xue Zhang, Chun Yang, XuCheng Yin
In this paper, we propose an efficient attention-free Single-Point Decoding Network (dubbed SPDN) for scene text recognition, which can replace the traditional attention-based decoding network.