Paper tables with annotated results for Learning Rich Image Region Representation for Visual Question Answering

Paper

Learning Rich Image Region Representation for Visual Question Answering

We propose to boost VQA by leveraging more powerful feature extractors by improving the representation ability of both visual and text features and the ensemble of models. For visual feature, some detection techniques are used to improve the detector. For text feature, we adopt BERT as the language model and find that it can significantly improve VQA performance. Our solution won the second place in the VQA Challenge 2019.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

Learning Rich Image Region Representation for Visual Question Answering

Reader Guidelines

Editor Guidelines