no code implementations • 2 May 2022 • AJ Piergiovanni, Wei Li, Weicheng Kuo, Mohammad Saffar, Fred Bertsch, Anelia Angelova
We present Answer-Me, a task-aware multi-task framework which unifies a variety of question answering tasks, such as, visual question answering, visual entailment, visual reasoning.
no code implementations • 31 Mar 2022 • Weicheng Kuo, Fred Bertsch, Wei Li, AJ Piergiovanni, Mohammad Saffar, Anelia Angelova
We propose FindIt, a simple and versatile framework that unifies a variety of visual grounding and localization tasks including referring expression comprehension, text-based localization, and object detection.
2 code implementations • 12 Mar 2020 • Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier
This work builds upon two lines of research: it combines the modeling flexibility of prior work on content-based sparse attention with the efficiency gains from approaches based on local, temporal sparse attention.
Ranked #5 on Image Generation on ImageNet 64x64 (Bits per dim metric)