Search Results for author: Alexander Kunitsyn

Found 2 papers, 0 papers with code

VLRM: Vision-Language Models act as Reward Models for Image Captioning

no code implementations • 2 Apr 2024 • Maksim Dzabraev, Alexander Kunitsyn, Andrei Ivaniuta

In this work, we present an unsupervised method for enhancing an image captioning model (in our case, BLIP2) using reinforcement learning and vision-language models like CLIP and BLIP2-ITM as reward models.

Image Captioning reinforcement-learning

Paper
Add Code

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

no code implementations • 14 Mar 2022 • Alexander Kunitsyn, Maksim Kalashnikov, Maksim Dzabraev, Andrei Ivaniuta

In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model.

Ranked #1 on Video Retrieval on TGIF (using extra training data)

Retrieval Text to Video Retrieval +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.