Search Results for author: Alexander Kunitsyn

Found 2 papers, 0 papers with code

VLRM: Vision-Language Models act as Reward Models for Image Captioning

no code implementations2 Apr 2024 Maksim Dzabraev, Alexander Kunitsyn, Andrei Ivaniuta

In this work, we present an unsupervised method for enhancing an image captioning model (in our case, BLIP2) using reinforcement learning and vision-language models like CLIP and BLIP2-ITM as reward models.

Image Captioning reinforcement-learning

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

no code implementations14 Mar 2022 Alexander Kunitsyn, Maksim Kalashnikov, Maksim Dzabraev, Andrei Ivaniuta

In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model.

 Ranked #1 on Video Retrieval on TGIF (using extra training data)

Retrieval Text to Video Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.