no code implementations • CVPR 2024 • Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte, Wei-An Lin, Hui Qu, Mingi Kwon, Ratheesh Kalarot
Furthermore, once trained, our guide model can be applied to various fine-tuned, domain-specific versions of the base diffusion model without the need for additional training: this "plug-and-play" functionality drastically improves inference computation while maintaining the visual fidelity of generated images.
no code implementations • 6 Jun 2022 • Fabio De Sousa Ribeiro, Kevin Duarte, Miles Everett, Georgios Leontidis, Mubarak Shah
The aim of this survey is to provide a comprehensive overview of the capsule network research landscape, which will serve as a valuable resource for the community going forward.
no code implementations • NeurIPS 2021 • Alec Kerrigan, Kevin Duarte, Yogesh Rawat, Mubarak Shah
Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently.
no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.
no code implementations • 22 May 2021 • Kevin Duarte, Yogesh S. Rawat, Mubarak Shah
By stochastically masking labels during loss computation, the method balances this ratio for each class, leading to improved recall on minority classes and improved precision on frequent classes.
1 code implementation • CVPR 2021 • Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah
In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.
1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang
Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.
1 code implementation • CVPR 2021 • Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah
We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer.
Ranked #1 on
Action Detection
on Multi-THUMOS
2 code implementations • ICLR 2021 • Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance.
no code implementations • 23 Apr 2020 • Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah
For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets.
1 code implementation • ICCV 2019 • Kevin Duarte, Yogesh S Rawat, Mubarak Shah
In this work we propose a capsule-based approach for semi-supervised video object segmentation.
no code implementations • 2 Dec 2018 • Bruce McIntosh, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
The existing works on actor-action localization are mainly focused on localization in a single frame instead of the full video.
no code implementations • NeurIPS 2018 • Kevin Duarte, Yogesh S Rawat, Mubarak Shah
In this work, we present a more elegant solution for action detection based on the recently developed capsule network.