no code implementations • 31 May 2023 • Quoc-Huy Tran, Muhammad Ahmed, Murad Popattia, M. Hassan Ahmed, Andrey Konin, M. Zeeshan Zia
This paper presents a self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications.
no code implementations • 15 Apr 2022 • Murad Popattia, Muhammad Rafi, Rizwan Qureshi, Shah Nawaz
A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions.