no code implementations • Neural Information Processing Systems 2024 • Nitesh Bharadwaj Gundavarapu, Luke Friedman, Raghav Goyal, Chaitra Hegde, Eirikur Agustsson, Sagar M. Waghmare, Mikhail Sirotenko, Ming-Hsuan Yang, Tobias Weyand, Boqing Gong, Leonid Sigal
Nevertheless, the majority of prior works that leverage MAE pre-training have focused on relatively short video representations (16 / 32 frames in length) largely due to hardware memory and compute limitations that scale poorly with video length due to the dense memory-intensive self-attention decoding.
Ranked #1 on Action Recognition on Diving-48 (using extra training data)
no code implementations • 13 Dec 2023 • Raghav Goyal, Wan-Cyuan Fan, Mennatullah Siam, Leonid Sigal
In this work we propose a novel, clip-based DETR-style encoder-decoder architecture, which focuses on systematically analyzing and addressing aforementioned challenges.
no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.
no code implementations • CVPR 2023 • Xitong Yang, Fu-Jen Chu, Matt Feiszli, Raghav Goyal, Lorenzo Torresani, Du Tran
In this paper, we propose to study these problems in a joint framework for long video understanding.
2 code implementations • 13 Jan 2022 • Peyman Bateni, Jarred Barber, Raghav Goyal, Vaden Masrani, Jan-Willem van de Meent, Leonid Sigal, Frank Wood
The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks.
no code implementations • CVPR 2021 • Siddhesh Khandelwal, Raghav Goyal, Leonid Sigal
Weakly-supervised approaches draw on image-level labels to build detectors/segmentors, while zero/few-shot methods assume abundant instance-level data for a set of base classes, and none to a few examples for novel classes.
no code implementations • 15 Jan 2020 • Shubham Agarwal, Raghav Goyal
This manuscript describes our approach for the Visual Dialog Challenge 2018.
2 code implementations • CVPR 2020 • Peyman Bateni, Raghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal
Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data.
Ranked #2 on Few-Shot Image Classification on Mini-Imagenet 10-way (5-shot) (using extra training data)
5 code implementations • ICCV 2017 • Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, Joanna Materzyńska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, Florian Hoppe, Christian Thurau, Ingo Bax, Roland Memisevic
Neural networks trained on datasets such as ImageNet have led to major advances in visual object classification.
Ranked #115 on Action Recognition on Something-Something V2
no code implementations • COLING 2016 • Raghav Goyal, Marc Dymetman, Eric Gaussier
Recently Wen et al. (2015) have proposed a Recurrent Neural Network (RNN) approach to the generation of utterances from dialog acts, and shown that although their model requires less effort to develop than a rule-based system, it is able to improve certain aspects of the utterances, in particular their naturalness.