Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering

This paper proposes a method to gain extra supervision via multi-task learning for multi-modal video question answering. Multi-modal video question answering is an important task that aims at the joint understanding of vision and language... (read more)

Results in Papers With Code
(↓ scroll down to see all results)