A general multimodal attention unit for any number of modalities. Graphical models inspire it, i.e., it infers several attention beliefs via aggregated interaction messages.
Source: Factor Graph AttentionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Question Answering | 2 | 14.29% |
Visual Question Answering | 2 | 14.29% |
Visual Question Answering (VQA) | 2 | 14.29% |
multimodal interaction | 1 | 7.14% |
Crowd Counting | 1 | 7.14% |
Facial Expression Recognition (FER) | 1 | 7.14% |
Dialogue State Tracking | 1 | 7.14% |
Prediction | 1 | 7.14% |
Scene-Aware Dialogue | 1 | 7.14% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |