Multimodal Machine Learning: Integrating Language, Vision and Speech

ACL 2017 Louis-Philippe MorencyTadas Baltru{\v{s}}aitis

Multimodal machine learning is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including linguistic, acoustic and visual messages. With the initial research on audio-visual speech recognition and more recently with image and video captioning projects, this research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities.This tutorial builds upon a recent course taught at Carnegie Mellon University during the Spring 2016 semester (CMU course 11-777) and two tutorials presented at CVPR 2016 and ICMI 2016... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.