ARVSU (Addressee Recognition in Visual Scenes with Utterances)

Introduced by Le et al. in Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

ARVSU contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario.

Source: Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

Homepage

No benchmarks yet. Start a new benchmark or link an existing one.

Paper	Code	Results	Date	Stars

No data loaders found. You can submit your data loader here.

GazeFollow