no code implementations • 19 Sep 2023 • Laura Hanu, Anita L. Verő, James Thewlis
Despite an exciting new wave of multimodal machine learning models, current approaches still struggle to interpret the complex contextual relationships between the different modalities present in videos.
1 code implementation • 19 Oct 2022 • Laura Hanu, James Thewlis, Yuki M. Asano, Christian Rupprecht
In this paper, we a) introduce a new dataset of videos, titles and comments; b) present an attention-based mechanism that allows the model to learn from sometimes irrelevant data such as comments; c) show that by using comments, our method is able to learn better, more contextualised, representations for image, video and audio representations.
no code implementations • 29 Sep 2021 • Laura Hanu, Yuki M Asano, James Thewlis, Christian Rupprecht
Learning strong representations for multi-modal retrieval is an important problem for many applications, such as recommendation and search.
no code implementations • 29 Jan 2020 • Isaac Dunn, Laura Hanu, Hadrien Pouget, Daniel Kroening, Tom Melham
We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment.