1 code implementation • 17 Apr 2025 • Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Daniel Li, Piotr Dollár, Christoph Feichtenhofer
Together with the core contrastive checkpoint, our PE family of models achieves state-of-the-art performance on a wide variety of tasks, including zero-shot image and video classification and retrieval; document, image, and video Q&A; and spatial tasks such as detection, depth estimation, and tracking.
Ranked #1 on
Object Detection
on COCO minival
(using extra training data)
no code implementations • 12 Feb 2025 • Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegaran, Shiry Ginosar, Jitendra Malik
At its core, PAR represents the behavior of all agents as a sequence of tokens, each representing an agent's state at a specific timestep.
no code implementations • 9 Jan 2025 • Jathushan Rajasegaran, Ilija Radosavovic, Rahul Ravishankar, Yossi Gandelsman, Christoph Feichtenhofer, Jitendra Malik
Our models are pre-trained on a diverse dataset of videos and images comprising over 1 trillion visual tokens.
no code implementations • 6 Jan 2025 • Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer, Jitendra Malik, Shiry Ginosar
Our approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions and spatial understanding jointly.
no code implementations • 12 Nov 2024 • Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks.
no code implementations • 6 Sep 2024 • Vongani Maluleke, Lea Müller, Jathushan Rajasegaran, Georgios Pavlakos, Shiry Ginosar, Angjoo Kanazawa, Jitendra Malik
Our contributions are a demonstration of the advantages of socially conditioned future motion prediction and an in-the-wild, couple dance video dataset to enable future research in this direction.
no code implementations • 9 Aug 2024 • Piraveen Sivakumar, Paul Janson, Jathushan Rajasegaran, Thanuja Ambegoda
In this paper, we address the challenge of generating novel views of real-world objects with limited multi-view images through our proposed approach, FewShotNeRF.
no code implementations • 15 Apr 2024 • Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann Lecun, Amir Globerson, Trevor Darrell
Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems.
no code implementations • 29 Feb 2024 • Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik
We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language.
no code implementations • 19 Jan 2024 • Boyi Li, Junming Chen, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik
This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity.
1 code implementation • ICCV 2023 • Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, Jitendra Malik
To analyze video, we use 3D reconstructions from HMR 2. 0 as input to a tracking system that operates in 3D.
Ranked #3 on
Pose Tracking
on PoseTrack2018
1 code implementation • CVPR 2023 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik
Subsequently, we propose a Lagrangian Action Recognition model by fusing 3D pose and contextualized appearance over tracklets.
Ranked #1 on
Action Recognition
on AVA v2.2
(using extra training data)
no code implementations • 1 Feb 2022 • Jathushan Rajasegaran, Chelsea Finn, Sergey Levine
In this paper, we study how meta-learning can be applied to tackle online problems of this nature, simultaneously adapting to changing tasks and input distributions and meta-training the model in order to adapt more quickly in the future.
no code implementations • CVPR 2022 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik
For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.
no code implementations • 8 Dec 2021 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik
For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.
1 code implementation • NeurIPS 2021 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik
We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.
no code implementations • 1 Jan 2021 • Karttikeya Mangalam, Rohin Garg, Jathushan Rajasegaran, Taesung Park
Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator.
no code implementations • 19 Oct 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.
2 code implementations • 17 Jun 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.
Ranked #13 on
Few-Shot Image Classification
on FC100 5-way (5-shot)
no code implementations • 2 Jun 2020 • Naveen Karunanayake, Jathushan Rajasegaran, Ashanie Gunathillake, Suranga Seneviratne, Guillaume Jourjon
We show that a novel approach of combining content embeddings and style embeddings outperforms the baseline methods for image similarity such as SIFT, SURF, and various image hashing methods.
1 code implementation • CVPR 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.
2 code implementations • 17 Mar 2020 • K J Joseph, Jathushan Rajasegaran, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian
In a real-world setting, object instances from new classes can be continuously encountered by object detectors.
1 code implementation • NeurIPS 2019 • Jathushan Rajasegaran, Munawar Hayat, Salman H. Khan, Fahad Shahbaz Khan, Ling Shao
In order to maintain an equilibrium between previous and newly acquired knowledge, we propose a simple controller to dynamically balance the model plasticity.
Ranked #7 on
Continual Learning
on F-CelebA (10 tasks)
1 code implementation • 26 Nov 2019 • Hirunima Jayasekara, Vinoj Jayasundara, Mohamed Athif, Jathushan Rajasegaran, Sandaru Jayasekara, Suranga Seneviratne, Ranga Rodrigo
Capsule networks excel in understanding spatial relationships in 2D data for vision related tasks.
1 code implementation • 3 Jun 2019 • Jathushan Rajasegaran, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang
In a conventional supervised learning setting, a machine learning model has access to examples of all object classes that are desired to be recognized during the inference stage.
5 code implementations • CVPR 2019 • Jathushan Rajasegaran, Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Suranga Seneviratne, Ranga Rodrigo
Capsule Network is a promising concept in deep learning, yet its true potential is not fully realized thus far, providing sub-par performance on several key benchmark datasets with complex data.
3 code implementations • 17 Apr 2019 • Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Jathushan Rajasegaran, Suranga Seneviratne, Ranga Rodrigo
Our system is useful in character recognition for localized languages that lack much labeled training data and even in other related more general contexts such as object recognition.
Ranked #6 on
Image Classification
on EMNIST-Letters
no code implementations • 16 Oct 2018 • Sameera Ramasinghe, Jathushan Rajasegaran, Vinoj Jayasundara, Kanchana Ranasinghe, Ranga Rodrigo, Ajith A. Pasqual
We propose three schemas for combining static and motion components: based on a variance ratio, principal components, and Cholesky decomposition.
no code implementations • 26 Apr 2018 • Jathushan Rajasegaran, Suranga Seneviratne, Guillaume Jourjon
We show that further performance increases can be achieved by combining style embeddings with content embeddings.