1 code implementation • 22 Nov 2023 • Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan
Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data.
1 code implementation • 18 May 2021 • Muhammad Maaz, Hanoona Abdul Rasheed, Dhanalaxmi Gaddam
The deconstruction learning forces the model to focus on local object parts, while reconstruction learning helps in learning the correlation between the parts.
Fine-Grained Visual Categorization Representation Learning +1