no code implementations • 15 Jan 2024 • Darshan Singh S, Zeeshan Khan, Makarand Tapaswi
We use the SRL and verb information to create rule-based detailed captions, making sure they capture most of the visual concepts.
no code implementations • 29 Oct 2022 • Darshan Singh S, Anchit Gupta, C. V. Jawahar, Makarand Tapaswi
We formulate lecture segmentation as an unsupervised task that leverages visual, textual, and OCR cues from the lecture, while clip representations are fine-tuned on a pretext self-supervised task of matching the narration with the temporally aligned visual content.