1 code implementation • 7 Jan 2025 • Siddharth Joshi, Besmira Nushi, Vidhisha Balachandran, Varun Chandrasekaran, Vibhav Vineet, Neel Joshi, Baharan Mirzasoleiman
Vision-language models (VLMs) are highly effective but often underperform on specialized tasks; for example, Llava-1. 5 struggles with chart and diagram understanding due to scarce task-specific training data.
no code implementations • 29 Oct 2024 • Natasha Butt, Varun Chandrasekaran, Neel Joshi, Besmira Nushi, Vidhisha Balachandran
Evaluations are limited by benchmark availability.
no code implementations • 17 Oct 2024 • Mazda Moayeri, Vidhisha Balachandran, Varun Chandrasekaran, Safoora Yousefi, Thomas Fel, Soheil Feizi, Besmira Nushi, Neel Joshi, Vibhav Vineet
With models getting stronger, evaluations have grown more complex, testing multiple skills in one benchmark and even in the same instance at once.
1 code implementation • 13 Sep 2024 • Vidhisha Balachandran, Jingya Chen, Neel Joshi, Besmira Nushi, Hamid Palangi, Eduardo Salinas, Vibhav Vineet, James Woffinden-Luey, Safoora Yousefi
Second, we introduce Eureka-Bench as an extensible collection of benchmarks testing capabilities that (i) are still challenging for state-of-the-art models and (ii) represent fundamental but overlooked language and multimodal capabilities.
1 code implementation • 21 Jun 2024 • Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, Neel Joshi
Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains.
no code implementations • ICCV 2023 • Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys
Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community.
1 code implementation • 12 Sep 2023 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet
A foreground-background segmentation algorithm is then used to generate foreground object masks.
no code implementations • 29 May 2023 • Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, Xin Wang
Control-GPT works by querying GPT-4 to write TikZ code, and the generated sketches are used as references alongside the text instructions for diffusion models (e. g., ControlNet) to generate photo-realistic images.
no code implementations • ICCV 2023 • Sruthi Sudhakar, Jon Hanzelka, Josh Bobillot, Tanmay Randhavane, Neel Joshi, Vibhav Vineet
An emerging alternative is to use synthetic data, but if the synthetic data is not similar enough to the real data, the performance is typically below that of training with real data.
no code implementations • 11 Nov 2022 • Gabriele Prato, Yale Song, Janarthanan Rajendran, R Devon Hjelm, Neel Joshi, Sarath Chandar
We show that our method is successful at enabling vision transformers to encode the temporal component of video data.
1 code implementation • 22 Jul 2022 • Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet
However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain.
1 code implementation • 11 Jul 2022 • Tyler LaBonte, Yale Song, Xin Wang, Vibhav Vineet, Neel Joshi
A critical object detection task is finetuning an existing model to detect novel objects, but the standard workflow requires bounding box annotations which are time-consuming and expensive to collect.
no code implementations • 20 Jun 2022 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet
For foreground object mask generation, we use a simple textual template with object class name as input to DALL-E to generate a diverse set of foreground images.
1 code implementation • 23 Apr 2022 • Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang
We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.
no code implementations • 15 Mar 2022 • Sharath Girish, Debadeepta Dey, Neel Joshi, Vibhav Vineet, Shital Shah, Caio Cesar Teodoro Mendes, Abhinav Shrivastava, Yale Song
We conduct a large-scale study with over 100 variants of ResNet and MobileNet architectures and evaluate them across 11 downstream scenarios in the SSL setting.
1 code implementation • CVPR 2022 • Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song
Contrastive learning relies on an assumption that positive pairs contain related views, e. g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance.
no code implementations • 21 Jan 2020 • Pallabi Ghosh, Vibhav Vineet, Larry S. Davis, Abhinav Shrivastava, Sudipta Sinha, Neel Joshi
Given color images and noisy and incomplete target depth maps, we optimize a randomly-initialized CNN model to reconstruct a depth map restored by virtue of using the CNN network structure as a prior combined with a view-constrained photo-consistency loss.
1 code implementation • 22 Oct 2019 • Mohammad Sadegh Norouzzadeh, Dan Morris, Sara Beery, Neel Joshi, Nebojsa Jojic, Jeff Clune
However, the accuracy of results depends on the amount, quality, and diversity of the data available to train models, and the literature has focused on projects with millions of relevant, labeled training images.
no code implementations • 11 Apr 2019 • Sara Beery, Yang Liu, Dan Morris, Jim Piavis, Ashish Kapoor, Markus Meister, Neel Joshi, Pietro Perona
The ability to detect and classify rare occurrences in images has important applications - for example, counting rare and endangered species when studying biodiversity, or detecting infrequent traffic scenarios that pose a danger to self-driving cars.
no code implementations • ECCV 2018 • Benjamin Hepp, Debadeepta Dey, Sudipta N. Sinha, Ashish Kapoor, Neel Joshi, Otmar Hilliges
We propose to learn a better utility function that predicts the usefulness of future viewpoints.
no code implementations • ICCV 2017 • Tae-Hyun Oh, Kyungdon Joo, Neel Joshi, Baoyuan Wang, In So Kweon, Sing Bing Kang
Cinemagraphs are a compelling way to convey dynamic aspects of a scene.
no code implementations • 20 Jun 2017 • Brian Guenter, Neel Joshi, Richard Stoakley, Andrew Keefe, Kevin Geary, Ryan Freeman, Jake Hundley, Pamela Patterson, David Hammon, Guillermo Herrera, Elena Sherman, Andrew Nowak, Randall Schubert, Peter Brewer, Louis Yang, Russell Mott, Geoff McKnight
In this work we demonstrate that commercial silicon CMOS image sensors can be thinned and formed into accurate, highly curved optical surfaces with undiminished functionality.
no code implementations • ICCV 2017 • Mike Roberts, Debadeepta Dey, Anh Truong, Sudipta Sinha, Shital Shah, Ashish Kapoor, Pat Hanrahan, Neel Joshi
Drones equipped with cameras are emerging as a powerful tool for large-scale aerial 3D scanning, but existing automatic flight planners do not exploit all available information about the scene, and can therefore produce inaccurate and incomplete 3D models.
no code implementations • 31 Mar 2017 • Wei-Sheng Lai, Yujia Huang, Neel Joshi, Chris Buehler, Ming-Hsuan Yang, Sing Bing Kang
We present a system for converting a fully panoramic ($360^\circ$) video into a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience.
no code implementations • 30 Jun 2015 • Libin Sun, Brian Guenter, Neel Joshi, Patrick Therien, James Hays
Unfortunately, custom lens design is costly (thousands to tens of thousands of dollars), time consuming (10-12 weeks typical lead time), and requires specialized optics design expertise.
no code implementations • CVPR 2014 • Huixuan Tang, Neel Joshi, Ashish Kapoor
The biggest hurdles to these efforts are: 1) the difficulty of generalizing across diverse types of distortions and 2) collecting the enormity of human scored training data that is needed to learn the measure.