no code implementations • 29 Jul 2021 • Amir Mazaheri, Mubarak Shah
To the best of our knowledge, this is the very first work on the text (free-form sentences) to video generation on more realistic video datasets like Actor and Action Dataset (A2D) or UCF101.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.
no code implementations • 3 Aug 2020 • Aaron Ott, Amir Mazaheri, Niels D. Lobo, Mubarak Shah
In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels.
no code implementations • 26 Sep 2018 • Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Bölöni
In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA).
1 code implementation • ECCV 2018 • Amir Mazaheri, Mubarak Shah
A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description.
1 code implementation • ICCV 2017 • Amir Mazaheri, Dong Zhang, Mubarak Shah
Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.
no code implementations • 13 Oct 2016 • Amir Mazaheri, Dong Zhang, Mubarak Shah
In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.