1 code implementation • ACL 2022 • Santiago Castro, Ruoyao Wang, Pingxuan Huang, Ian Stewart, Oana Ignat, Nan Liu, Jonathan C. Stroud, Rada Mihalcea
We propose fill-in-the-blanks as a video understanding evaluation framework and introduce FIBER -- a novel dataset consisting of 28, 000 videos and descriptions in support of this evaluation framework.
no code implementations • 29 Jul 2020 • Jonathan C. Stroud, Zhichao Lu, Chen Sun, Jia Deng, Rahul Sukthankar, Cordelia Schmid, David A. Ross
Based on this observation, we propose to use text as a method for learning video representations.
no code implementations • 4 Dec 2019 • Jonathan C. Stroud, Ryan McCaffrey, Rada Mihalcea, Jia Deng, Olga Russakovsky
Temporal grounding entails establishing a correspondence between natural language event descriptions and their visual depictions.
1 code implementation • 19 Dec 2018 • Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar
State-of-the-art methods for video action recognition commonly use an ensemble of two networks: the spatial stream, which takes RGB frames as input, and the temporal stream, which takes optical flow as input.
Ranked #11 on Action Recognition on AVA v2.1
no code implementations • CVPR 2017 • Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng
We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores.