TextVQA

25 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find TextVQA models and implementations

Most implemented papers

Towards VQA Models That Can Read

facebookresearch/pythia CVPR 2019

We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset.

CogVLM: Visual Expert for Pretrained Language Models

thudm/cogvlm 6 Nov 2023

We introduce CogVLM, a powerful open-source visual language foundation model.

CogVLM2: Visual Language Models for Image and Video Understanding

thudm/glm-4 29 Aug 2024

Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications.

Structured Multimodal Attentions for TextVQA

chenyugao-cs/sma 1 Jun 2020

In this paper, we propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.

Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA

adlnlp/attention_vl CVPR 2020

Recent work has explored the TextVQA task that requires reading and understanding text in images to answer a question.

Spatially Aware Multimodal Transformers for TextVQA

yashkant/sam-textvqa ECCV 2020

Further, each head in our multi-head self-attention layer focuses on a different subset of relations.

RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering

xiaojino/RUArt 24 Oct 2020

Text-based visual question answering (VQA) requires to read and understand text in an image to correctly answer a given question.

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

microsoft/TAP CVPR 2021

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

ZephyrZhuQi/ssbaseline 9 Dec 2020

Texts appearing in daily scenes that can be recognized by OCR (Optical Character Recognition) tools contain significant information, such as street name, product brand and prices.