Multimodal Text Prediction

1 papers with code • 1 benchmarks • 2 datasets

Multimodal text prediction is a type of natural language processing that involves predicting the next word or sequence of words in a sentence, given multiple modalities or types of input. In traditional text prediction, the prediction is based solely on the context of the sentence, such as the words that precede the target word. In multimodal text prediction, additional modalities, such as images, audio, or user behavior, are also used to inform the prediction.

For example, in a multimodal text prediction system for captioning images, the system may use both the content of the image and the words that have been typed so far to generate the next word in the caption. The image may provide additional context or information about the content of the caption, while the typed words may provide information about the style or tone of the caption.

Multimodal text prediction can be achieved using a variety of techniques, including deep learning models and statistical models. These models can be trained on large datasets of text and multimodal inputs to learn the relationships between the different types of data and improve the accuracy of the predictions.

Multimodal text prediction has many applications, including chatbots, virtual assistants, and predictive text input for mobile devices. By incorporating additional modalities into the prediction process, multimodal text prediction systems can provide more accurate and useful predictions, improving the overall user experience.

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Text Prediction

Trend	Dataset	Best Model	Paper	Code	Compare
	MultiSubs	9-gram LM with back-off			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

MultiSubs: A Large-scale Multimodal and Multilingual Dataset

josiahwang/multisubs-eval • LREC 2022

The dataset will benefit research on visual grounding of words especially in the context of free-form sentences, and can be obtained from https://doi. org/10. 5281/zenodo. 5034604 under a Creative Commons licence.

Paper
Code

Multimodal Text Prediction

Benchmarks Add a Result

Datasets

Most implemented papers

MultiSubs: A Large-scale Multimodal and Multilingual Dataset

Content

Benchmarks

Add a Result