no code implementations • 19 Jul 2024 • Seunggeun Chi, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan Siddiqui, Karthik Ramani, Kwonjoon Lee
We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models.
no code implementations • CVPR 2024 • Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee
To address this limitation, we explore the generative capability of a large video-language model in our work and further, develop the understanding of plausibility in an action sequence by introducing two objective functions, a counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss.
Ranked #1 on Action Anticipation on EPIC-KITCHENS-100
no code implementations • 3 May 2024 • Olivier Jeunen, Jatin Mandav, Ivan Potapov, Nakul Agarwal, Sourabh Vaid, Wenzhe Shi, Aleksei Ustimenko
We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e. g. long-term user retention or growth).
no code implementations • 7 Jan 2024 • Victoria M. Dax, Jiachen Li, Enna Sachdeva, Nakul Agarwal, Mykel J. Kochenderfer
The results show superior performance compared to existing methods in modeling spatio-temporal relations, motion prediction, and identifying time-invariant latent features.
no code implementations • CVPR 2024 • Hongji Guo, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee, Qiang Ji
The objective is to make the two decoupled tasks assist each other and eventually improve the action anticipation task.
Ranked #1 on Action Anticipation on 50-Salads
1 code implementation • 22 Nov 2023 • Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
To interpret the important text evidence for question answering, we generalize the concept bottleneck model to work with tokens and nonlinear models, which uses hard attention to select a small subset of tokens from the free-form text as inputs to the LLM reasoner.
1 code implementation • 31 Oct 2023 • Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun
To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.
Ranked #4 on Long Term Action Anticipation on Ego4D
1 code implementation • 12 Sep 2023 • Enna Sachdeva, Nakul Agarwal, Suhas Chundi, Sean Roelofs, Jiachen Li, Mykel Kochenderfer, Chiho Choi, Behzad Dariush
The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial.
1 code implementation • 31 Jul 2023 • Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.
Ranked #2 on Long Term Action Anticipation on Ego4D
no code implementations • CVPR 2023 • Harshayu Girase, Nakul Agarwal, Chiho Choi, Karttikeya Mangalam
We present RAFTformer, a real-time action forecasting transformer for latency aware real-world action forecasting applications.
no code implementations • ICCV 2023 • Reza Ghoddoosian, Isht Dwivedi, Nakul Agarwal, Behzad Dariush
We present a novel method for weakly-supervised action segmentation and unseen error detection in anomalous instructional videos.
no code implementations • ICCV 2023 • Nakul Agarwal, Yi-Ting Chen
We introduce a novel representation called Ordered Atomic Activity for interactive scenario understanding.
no code implementations • CVPR 2023 • Hyung-gun Chi, Kwonjoon Lee, Nakul Agarwal, Yi Xu, Karthik Ramani, Chiho Choi
SALF is challenging because it requires understanding the underlying physics of video observations to predict future action locations accurately.
no code implementations • CVPR 2022 • Reza Ghoddoosian, Isht Dwivedi, Nakul Agarwal, Chiho Choi, Behzad Dariush
Experimental results show efficacy of the proposed methods both qualitatively and quantitatively in two domains of cooking and assembly.
no code implementations • 19 Oct 2020 • Nakul Agarwal, Yi-Ting Chen, Behzad Dariush, Ming-Hsuan Yang
Spatio-temporal action localization is an important problem in computer vision that involves detecting where and when activities occur, and therefore requires modeling of both spatial and temporal features.
no code implementations • 1 Aug 2018 • A. H. Abdul Hafez, Nakul Agarwal, C. V. Jawahar
This problem is solved by finding the maximum flow in a directed graph flow-network, whose vertices represent the matches between frames in the test and reference sequences.