Search Results for author: Tushar Nagarajan

Found 26 papers, 12 papers with code

Human Action Anticipation: A Survey

no code implementations17 Oct 2024 Bolin Lai, Sam Toyer, Tushar Nagarajan, Rohit Girdhar, Shengxin Zha, James M. Rehg, Kris Kitani, Kristen Grauman, Ruta Desai, Miao Liu

Predicting future human behavior is an increasingly popular topic in computer vision, driven by the interest in applications such as autonomous vehicles, digital assistants and human-robot interactions.

Action Anticipation Autonomous Vehicles +1

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

no code implementations4 Oct 2024 Han Lin, Tushar Nagarajan, Nicolas Ballas, Mido Assran, Mojtaba Komeili, Mohit Bansal, Koustuv Sinha

In this work, we show that a strong off-the-shelf frozen pretrained visual encoder, along with a well designed prediction model, can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning without the need for pretraining the prediction model, nor requiring additional supervision from language or ASR.

Action Anticipation Denoising +1

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos

no code implementations30 Sep 2024 Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Fu-Jen Chu, Kris Kitani, Gedas Bertasius, Xitong Yang

Goal-oriented planning, or anticipating a series of actions that transition an agent from its current state to a predefined objective, is crucial for developing intelligent assistants aiding users in daily procedural tasks.

AMEGO: Active Memory from long EGOcentric videos

no code implementations17 Sep 2024 Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta, Dima Damen

Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception.

Video Understanding

Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning

no code implementations7 Aug 2024 Zi-Yi Dou, Xitong Yang, Tushar Nagarajan, Huiyu Wang, Jing Huang, Nanyun Peng, Kris Kitani, Fu-Jen Chu

We present EMBED (Egocentric Models Built with Exocentric Data), a method designed to transform exocentric video-language data for egocentric video representation learning.

Multi-Instance Retrieval Representation Learning +1

User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance

no code implementations4 Aug 2024 Mrinal Verghese, Brian Chen, Hamid Eghbalzadeh, Tushar Nagarajan, Ruta Desai

Our research investigates the capability of modern multimodal reasoning models, powered by Large Language Models (LLMs), to facilitate vision-powered assistants for multi-step daily activities.

Action Anticipation Benchmarking +1

ExpertAF: Expert Actionable Feedback from Video

no code implementations1 Aug 2024 Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, Kristen Grauman

Our method takes a video demonstration and its accompanying 3D body pose and generates (1) free-form expert commentary describing what the person is doing well and what they could improve, and (2) a visual expert demonstration that incorporates the required corrections.

Language Modelling Video Retrieval

Step Differences in Instructional Video

1 code implementation CVPR 2024 Tushar Nagarajan, Lorenzo Torresani

Comparing a user video to a reference how-to video is a key requirement for AR/VR technology delivering personalized assistance tailored to the user's progress.

Language Modelling

Video ReCap: Recursive Captioning of Hour-Long Videos

2 code implementations CVPR 2024 Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius

We utilize a curriculum learning training scheme to learn the hierarchical structure of videos, starting from clip-level captions describing atomic actions, then focusing on segment-level descriptions, and concluding with generating summaries for hour-long videos.

Video Captioning Video Understanding +1

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

2 code implementations CVPR 2024 Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

1 code implementation27 Sep 2023 Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i. e. text, image, video, audio, IMU motion sensor), and generates textual responses.

Language Modelling Video Question Answering

Shaping embodied agent behavior with activity-context priors from egocentric video

no code implementations NeurIPS 2021 Tushar Nagarajan, Kristen Grauman

For a given object, an activity-context prior represents the set of other compatible objects that are required for activities to succeed (e. g., a knife and cutting board brought together with a tomato are conducive to cutting).

Ego4D: Around the World in 3,000 Hours of Egocentric Video

8 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Environment Predictive Coding for Embodied Agents

no code implementations3 Feb 2021 Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.

Self-Supervised Learning

Differentiable Causal Discovery Under Unmeasured Confounding

1 code implementation14 Oct 2020 Rohit Bhattacharya, Tushar Nagarajan, Daniel Malinsky, Ilya Shpitser

In this work, we derive differentiable algebraic constraints that fully characterize the space of ancestral ADMGs, as well as more general classes of ADMGs, arid ADMGs and bow-free ADMGs, that capture all equality restrictions on the observed variables.

Causal Discovery

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

1 code implementation NeurIPS 2020 Tushar Nagarajan, Kristen Grauman

We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen).

EGO-TOPO: Environment Affordances from Egocentric Video

1 code implementation CVPR 2020 Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman

We introduce a model for environment affordances that is learned directly from egocentric video.

Grounded Human-Object Interaction Hotspots from Video (Extended Abstract)

no code implementations3 Jun 2019 Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.

Human-Object Interaction Detection Object +1

Grounded Human-Object Interaction Hotspots from Video

1 code implementation ICCV 2019 Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.

Human-Object Interaction Detection Object +3

Attributes as Operators: Factorizing Unseen Attribute-Object Compositions

1 code implementation ECCV 2018 Tushar Nagarajan, Kristen Grauman

In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.

Attribute Compositional Zero-Shot Learning +2

BlockDrop: Dynamic Inference Paths in Residual Networks

1 code implementation CVPR 2018 Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S. Davis, Kristen Grauman, Rogerio Feris

Very deep convolutional neural networks offer excellent recognition results, yet their computational expense limits their impact for many real-world applications.

Reinforcement Learning

CANDiS: Coupled & Attention-Driven Neural Distant Supervision

no code implementations26 Oct 2017 Tushar Nagarajan, Sharmistha, Partha Talukdar

The unsupervised nature of this technique allows it to scale to web-scale relation extraction tasks, at the expense of noise in the training data.

Relation Relation Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.