Search Results for author: Amir Mazaheri

Found 7 papers, 3 papers with code

Video Generation from Text Employing Latent Path Construction for Temporal Modeling

no code implementations29 Jul 2021 Amir Mazaheri, Mubarak Shah

To the best of our knowledge, this is the very first work on the text (free-form sentences) to video generation on more realistic video datasets like Actor and Action Dataset (A2D) or UCF101.

Text-to-Video Generation Video Generation

MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

1 code implementation Findings of the Association for Computational Linguistics 2020 Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah

We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.

Question Answering Visual Question Answering

Deep Photo Cropper and Enhancer

no code implementations3 Aug 2020 Aaron Ott, Amir Mazaheri, Niels D. Lobo, Mubarak Shah

In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels.

Image Enhancement Super-Resolution

Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention

no code implementations26 Sep 2018 Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Bölöni

In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA).

Visual Text Correction

1 code implementation ECCV 2018 Amir Mazaheri, Mubarak Shah

A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description.

Grammatical Error Correction Sentence +1

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

1 code implementation ICCV 2017 Amir Mazaheri, Dong Zhang, Mubarak Shah

Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.

Sentence

Video Fill in the Blank with Merging LSTMs

no code implementations13 Oct 2016 Amir Mazaheri, Dong Zhang, Mubarak Shah

In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.

Cannot find the paper you are looking for? You can Submit a new open access paper.