multimodal interaction

33 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

dandelin/vilt 5 Feb 2021

Vision-and-Language Pre-training (VLP) has improved performance on various joint vision-and-language downstream tasks.

MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks

declare-lab/mm-bigbench 13 Oct 2023

Consequently, our work complements research on the performance of MLLMs in multimodal comprehension tasks, achieving a more comprehensive and holistic evaluation of MLLMs.

Recurrent Multimodal Interaction for Referring Image Segmentation

chenxi116/TF-phrasecut-public ICCV 2017

In this paper we are interested in the problem of image segmentation given natural language descriptions, i. e. referring expressions.

Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer

jefferyyu/umt ACL 2020

To tackle the first issue, we propose a multimodal interaction module to obtain both image-aware word representations and word-aware visual representations.

Dynamic Modality Interaction Modeling for Image-Text Retrieval

LgQu/DIME ACM Special Interest Group on Information Retrieval 2021

To address these issues, we develop a novel modality interaction modeling network based upon the routing mechanism, which is the first unified and dynamic multimodal interaction framework towards image-text retrieval.

Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering

trunpm/tpt-for-videoqa 10 Sep 2021

Targeting these issues, this paper proposes a novel Temporal Pyramid Transformer (TPT) model with multimodal interaction for VideoQA.

ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle

amr-gomaa/ML-PersRef 3 Nov 2021

This allows for novel approaches to interaction with the vehicle that go beyond traditional touch-based and voice command approaches, such as emotion recognition, head rotation, eye gaze, and pointing gestures.

Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering

Mvrjustid/MHN-IJCAI22 9 May 2022

With a multiscale sampling, RMI iterates the interaction of appearance-motion information at each scale and the question embeddings to build the multilevel question-guided visual representations.

Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions

nguyentthong/adaptive_contrastive_mrhp 7 Nov 2022

To overcome the aforementioned issues, we propose Multimodal Contrastive Learning for Multimodal Review Helpfulness Prediction (MRHP) problem, concentrating on mutual information between input modalities to explicitly elaborate cross-modal relations.

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

YihanCao123/awesome-aigc 7 Mar 2023

The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.