Are you doing what I say? On modalities alignment in ALFRED

12 Oct 2021  ·  Ting-Rui Chiang, Yi-Ting Yeh, Ta-Chung Chi, Yau-Shian Wang ·

ALFRED is a recently proposed benchmark that requires a model to complete tasks in simulated house environments specified by instructions in natural language. We hypothesize that key to success is accurately aligning the text modality with visual inputs. Motivated by this, we inspect how well existing models can align these modalities using our proposed intrinsic metric, boundary adherence score (BAS). The results show the previous models are indeed failing to perform proper alignment. To address this issue, we introduce approaches aimed at improving model alignment and demonstrate how improved alignment, improves end task performance.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here