1 code implementation • 7 Jan 2025 • Siddharth Joshi, Besmira Nushi, Vidhisha Balachandran, Varun Chandrasekaran, Vibhav Vineet, Neel Joshi, Baharan Mirzasoleiman
Vision-language models (VLMs) are highly effective but often underperform on specialized tasks; for example, Llava-1. 5 struggles with chart and diagram understanding due to scarce task-specific training data.
1 code implementation • 20 Dec 2024 • Yuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet
In this work, we systematically study audio event relation modeling in TTA generation models.
1 code implementation • NeurIPS 2023 • Rajat Modi, Vibhav Vineet, Yogesh Singh Rawat
This paper explores the impact of occlusions in video action detection.
no code implementations • 17 Oct 2024 • Mazda Moayeri, Vidhisha Balachandran, Varun Chandrasekaran, Safoora Yousefi, Thomas Fel, Soheil Feizi, Besmira Nushi, Neel Joshi, Vibhav Vineet
With models getting stronger, evaluations have grown more complex, testing multiple skills in one benchmark and even in the same instance at once.
1 code implementation • 13 Sep 2024 • Vidhisha Balachandran, Jingya Chen, Neel Joshi, Besmira Nushi, Hamid Palangi, Eduardo Salinas, Vibhav Vineet, James Woffinden-Luey, Safoora Yousefi
Second, we introduce Eureka-Bench as an extensible collection of benchmarks testing capabilities that (i) are still challenging for state-of-the-art models and (ii) represent fundamental but overlooked language and multimodal capabilities.
1 code implementation • 21 Aug 2024 • Shehreen Azad, Yash Jain, Rishit Garg, Yogesh S Rawat, Vibhav Vineet
We benchmark 17 state-of-the-art VLMs using these datasets and find that they consistently struggle with both depth and height perception.
1 code implementation • 21 Jun 2024 • Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, Neel Joshi
Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains.
no code implementations • 16 Jun 2024 • Joykirat Singh, Akshay Nambi, Vibhav Vineet
Large Language Models (LLMs) have been applied to Math Word Problems (MWPs) with transformative impacts, revolutionizing how these complex problems are approached and solved in various domains including educational settings.
no code implementations • 29 Feb 2024 • Shresth Grover, Vibhav Vineet, Yogesh S Rawat
In this work we present a novel task of understanding unintentional human activities in videos.
no code implementations • 21 Dec 2023 • Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge
We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution.
1 code implementation • CVPR 2024 • Yash Jain, Anshul Nasery, Vibhav Vineet, Harkirat Behl
In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control.
1 code implementation • NeurIPS 2023 • Yash Jain, Harkirat Behl, Zsolt Kira, Vibhav Vineet
Construction of a universal detector poses a crucial question: How can we most effectively train a model on a large mixture of datasets?
no code implementations • ICCV 2023 • Nishant Jain, Harkirat Behl, Yogesh Singh Rawat, Vibhav Vineet
A recent trend in deep learning algorithms has been towards training large scale models, having high parameter count and trained on big dataset.
1 code implementation • 12 Sep 2023 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet
A foreground-background segmentation algorithm is then used to generate foreground object masks.
no code implementations • 15 Jun 2023 • Madeline Chantry Schiappa, Shehreen Azad, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet
In this work, we perform a robustness analysis of Visual Foundation Models (VFMs) for segmentation tasks and focus on robustness against real-world distribution shift inspired perturbations.
no code implementations • 9 Jun 2023 • Akash Kumar, Ashlesha Kumar, Vibhav Vineet, Yogesh Singh Rawat
In this work, we first provide a benchmark that enables a comparison of existing approaches on the same ground.
Ranked #3 on
Self-Supervised Action Recognition
on UCF101
no code implementations • 29 May 2023 • Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, Xin Wang
Control-GPT works by querying GPT-4 to write TikZ code, and the generated sketches are used as references alongside the text instructions for diffusion models (e. g., ControlNet) to generate photo-realistic images.
no code implementations • 15 Mar 2023 • Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov
A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations.
no code implementations • ICCV 2023 • Sruthi Sudhakar, Jon Hanzelka, Josh Bobillot, Tanmay Randhavane, Neel Joshi, Vibhav Vineet
An emerging alternative is to use synthetic data, but if the synthetic data is not similar enough to the real data, the performance is typically below that of training with real data.
no code implementations • CVPR 2023 • Madeline Chantry Schiappa, Naman Biyani, Prudvi Kamtam, Shruti Vyas, Hamid Palangi, Vibhav Vineet, Yogesh S. Rawat
In this work, we perform a large-scale robustness analysis of these existing models for video action recognition.
1 code implementation • 20 Dec 2022 • Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang
We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
1 code implementation • 15 Dec 2022 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Laurent Itti, Vibhav Vineet
Finally, the third component creates a large-scale pseudo-labeled instance segmentation training dataset by compositing the foreground object masks onto the original and generated background images.
no code implementations • 22 Oct 2022 • Jinoh Cho, Minguk Kang, Vibhav Vineet, Jaesik Park
However, existing image completion methods tend to fill in the missing region with the surrounding texture instead of hallucinating a visual instance that is suitable in accordance with the context of the scene.
no code implementations • 22 Sep 2022 • Benoit Guillard, Sai Vemprala, Jayesh K. Gupta, Ondrej Miksik, Vibhav Vineet, Pascal Fua, Ashish Kapoor
Simulating realistic sensors is a challenging part in data generation for autonomous systems, often involving carefully handcrafted sensor design, scene properties, and physics modeling.
1 code implementation • 22 Jul 2022 • Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet
However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain.
1 code implementation • 11 Jul 2022 • Christopher Agia, Krishna Murthy Jatavallabhula, Mohamed Khodeir, Ondrej Miksik, Vibhav Vineet, Mustafa Mukadam, Liam Paull, Florian Shkurti
3D scene graphs (3DSGs) are an emerging description; unifying symbolic, topological, and metric scene representations.
1 code implementation • 11 Jul 2022 • Tyler LaBonte, Yale Song, Xin Wang, Vibhav Vineet, Neel Joshi
A critical object detection task is finetuning an existing model to detect novel objects, but the standard workflow requires bounding box annotations which are time-consuming and expensive to collect.
1 code implementation • 5 Jul 2022 • Madeline C. Schiappa, Shruti Vyas, Hamid Palangi, Yogesh S. Rawat, Vibhav Vineet
Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning.
1 code implementation • 4 Jul 2022 • Madeline Chantry Schiappa, Naman Biyani, Prudvi Kamtam, Shruti Vyas, Hamid Palangi, Vibhav Vineet, Yogesh Rawat
In this work, we perform a large-scale robustness analysis of these existing models for video action recognition.
no code implementations • 20 Jun 2022 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet
For foreground object mask generation, we use a simple textual template with object class name as input to DALL-E to generate a diverse set of foreground images.
1 code implementation • ICLR 2022 • Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry
Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools.
1 code implementation • ACL 2022 • Benno Krojer, Vaibhav Adlakha, Vibhav Vineet, Yash Goyal, Edoardo Ponti, Siva Reddy
In particular, models are tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description.
Ranked #1 on
Image Retrieval
on ImageCoDe
1 code implementation • 20 Mar 2022 • Eric Heiden, Ziang Liu, Vibhav Vineet, Erwin Coumans, Gaurav S. Sukhatme
Being able to reproduce physical phenomena ranging from light interaction to contact mechanics, simulators are becoming increasingly useful in more and more application domains where real-world interaction or labeled data are difficult to obtain.
no code implementations • 15 Mar 2022 • Sharath Girish, Debadeepta Dey, Neel Joshi, Vibhav Vineet, Shital Shah, Caio Cesar Teodoro Mendes, Abhinav Shrivastava, Yale Song
We conduct a large-scale study with over 100 variants of ResNet and MobileNet architectures and evaluate them across 11 downstream scenarios in the SSL setting.
1 code implementation • CVPR 2022 • Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song
Contrastive learning relies on an assumption that positive pairs contain related views, e. g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance.
no code implementations • CVPR 2022 • Weizhe Liu, Bugra Tekin, Huseyin Coskun, Vibhav Vineet, Pascal Fua, Marc Pollefeys
To this end, we propose an approach to enforce temporal priors on the optimal transport matrix, which leverages temporal consistency, while allowing for variations in the order of actions.
no code implementations • 25 Jun 2021 • Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor
The ability to perform causal and counterfactual reasoning are central properties of human intelligence.
1 code implementation • 7 Jun 2021 • Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry
We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation.
1 code implementation • 9 Feb 2021 • Zhiwei Xu, Thalaiyasingam Ajanthan, Vibhav Vineet, Richard Hartley
In this work, we introduce a Resource Aware Neuron Pruning (RANP) algorithm that prunes 3D CNNs at initialization to high sparsity levels.
no code implementations • 21 Oct 2020 • Ziqi Fan, Vibhav Vineet, Chenshen Lu, T. W. Wu, Kyla McMullen
The present work proposes a method to infer object geometry from scattering features by training convolutional neural networks.
1 code implementation • 6 Oct 2020 • Zhiwei Xu, Thalaiyasingam Ajanthan, Vibhav Vineet, Richard Hartley
Specifically, the core idea is to obtain an importance score for each neuron based on their sensitivity to the loss function.
no code implementations • ECCV 2020 • Harkirat Singh Behl, Atılım Güneş Baydin, Ran Gal, Philip H. S. Torr, Vibhav Vineet
Simulation is increasingly being used for generating large labelled datasets in many machine learning problems.
no code implementations • 21 Jan 2020 • Pallabi Ghosh, Vibhav Vineet, Larry S. Davis, Abhinav Shrivastava, Sudipta Sinha, Neel Joshi
Given color images and noisy and incomplete target depth maps, we optimize a randomly-initialized CNN model to reconstruct a depth map restored by virtue of using the CNN network structure as a prior combined with a view-constrained photo-consistency loss.
1 code implementation • 30 Oct 2019 • Ziqi Fan, Vibhav Vineet, Hannes Gamper, Nikunj Raghuvanshi
Diffracted scattering and occlusion are important acoustic effects in interactive auralization and noise control applications, typically requiring expensive numerical simulation.
2 code implementations • 16 Sep 2019 • Rogerio Bonatti, Ratnesh Madaan, Vibhav Vineet, Sebastian Scherer, Ashish Kapoor
We analyze the rich latent spaces learned with our proposed representations, and show that the use of our cross-modal architecture significantly improves control policy performance as compared to end-to-end learning or purely unsupervised feature extractors.
1 code implementation • 15 Mar 2019 • Ondrej Miksik, Vibhav Vineet
For each time step, our dynamic map maintains a relative pose of each volume with respect to the stationary background.
no code implementations • 25 Feb 2019 • Zihao W. Wang, Vibhav Vineet, Francesco Pittaluga, Sudipta Sinha, Oliver Cossairt, Sing Bing Kang
We propose a lens-free coded aperture camera system for human action recognition that is privacy-preserving.
no code implementations • 9 Feb 2019 • Tomas Hodan, Vibhav Vineet, Ran Gal, Emanuel Shalev, Jon Hanzelka, Treb Connell, Pedro Urbina, Sudipta N. Sinha, Brian Guenter
We present an approach to synthesize highly photorealistic images of 3D object models, which we use to train a convolutional neural network for detecting the objects in real images.
2 code implementations • 7 Aug 2016 • Stephan R. Richter, Vibhav Vineet, Stefan Roth, Vladlen Koltun
Recent progress in computer vision has been driven by high-capacity models trained on large datasets.
no code implementations • CVPR 2016 • Rene Ranftl, Vibhav Vineet, Qifeng Chen, Vladlen Koltun
We present an approach to dense depth estimation from a single monocular camera that is moving through a dynamic scene.
1 code implementation • CVPR 2016 • Abhijit Kundu, Vibhav Vineet, Vladlen Koltun
We present an approach to long-range spatio-temporal regularization in semantic video segmentation.
no code implementations • 13 Oct 2015 • Stuart Golodetz, Michael Sapienza, Julien P. C. Valentin, Vibhav Vineet, Ming-Ming Cheng, Anurag Arnab, Victor A. Prisacariu, Olaf Kähler, Carl Yuheng Ren, David W. Murray, Shahram Izadi, Philip H. S. Torr
We present an open-source, real-time implementation of SemanticPaint, a system for geometric reconstruction, object-class segmentation and learning of 3D scenes.
6 code implementations • ICCV 2015 • Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr
Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding.
Ranked #36 on
Semantic Segmentation
on PASCAL VOC 2012 test
no code implementations • CVPR 2014 • Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, Philip H. S. Torr
The concepts of objects and attributes are both important for describing images precisely, since verbal descriptions often contain both adjectives and nouns (e. g. "I see a shiny red chair').
no code implementations • 25 Mar 2014 • Vibhav Vineet, Jonathan Warrell, Philip H. S. Torr
The algorithm converges to a local minimum for any general pairwise potential, and we give a theoretical analysis of the properties of the algorithm, characterizing the situations in which we can expect good performance.
no code implementations • NeurIPS 2013 • Vibhav Vineet, Carsten Rother, Philip Torr
Many methods have been proposed to recover the intrinsic scene properties such as shape, reflectance and illumination from a single image.
no code implementations • 16 Oct 2013 • Ming-Ming Cheng, Shuai Zheng, Wen-Yan Lin, Jonathan Warrell, Vibhav Vineet, Paul Sturgess, Nigel Crook, Niloy Mitra, Philip Torr
This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images.