no code implementations • 15 Jul 2024 • Nirat Saini, Navaneeth Bodla, Ashish Shrivastava, Avinash Ravichandran, Xiao Zhang, Abhinav Shrivastava, Bharat Singh
This process begins with inserting the object into a single frame using a ControlNet-based inpainting diffusion model, and then generating subsequent frames conditioned on features from an inpainted frame as an anchor to minimize the domain gap between the background and the object.
no code implementations • 15 Jun 2024 • Bharat Singh, Viveka Kulharia, Luyu Yang, Avinash Ravichandran, Ambrish Tyagi, Ashish Shrivastava
Multimodal synthetic data generation is crucial in domains such as autonomous driving, robotics, augmented/virtual reality, and retail.
no code implementations • ICCV 2023 • Koutilya PNVR, Bharat Singh, Pallabi Ghosh, Behjat Siddiquie, David Jacobs
First, we show that the latent space of LDMs (z-space) is a better input representation compared to other feature representations like RGB images or CLIP encodings for text-based image segmentation.
1 code implementation • 10 Feb 2021 • Bharat Singh, Mahyar Najibi, Abhishek Sharma, Larry S. Davis
The resulting algorithm is referred to as AutoFocus and results in a 2. 5-5 times speed-up during inference when used with SNIP.
1 code implementation • 19 Jul 2020 • Rohun Tripathi, Vasu Singla, Mahyar Najibi, Bharat Singh, Abhishek Sharma, Larry Davis
The widely adopted sequential variant of Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines.
no code implementations • 12 May 2020 • Rohun Tripathi, Bharat Singh
To this end, RSO adds a perturbation to a weight in a deep neural network and tests if it reduces the loss on a mini-batch.
2 code implementations • 30 Dec 2019 • Zhe Wu, Zuxuan Wu, Bharat Singh, Larry S. Davis
Deep neural networks have been shown to suffer from poor generalization when small perturbations are added (like Gaussian noise), yet little work has been done to evaluate their robustness to more natural image transformations like photo filters.
no code implementations • 15 May 2019 • Chongyang Bai, Maksim Bolonkin, Judee Burgoon, Chao Chen, Norah Dunbar, Bharat Singh, V. S. Subrahmanian, Zhe Wu
Most work on automated deception detection (ADD) in video has two restrictions: (i) it focuses on a video of one person, and (ii) it focuses on a single act of deception in a one or two minute video.
no code implementations • 11 Apr 2019 • Hengduo Li, Bharat Singh, Mahyar Najibi, Zuxuan Wu, Larry S. Davis
We analyze how well their features generalize to tasks like image classification, semantic segmentation and object detection on small datasets like PASCAL-VOC, Caltech-256, SUN-397, Flowers-102 etc.
no code implementations • 14 Dec 2018 • Xiyang Dai, Bharat Singh, Joe Yue-Hei Ng, Larry S. Davis
We present Temporal Aggregation Network (TAN) which decomposes 3D convolutions into spatial and temporal aggregation blocks.
no code implementations • CVPR 2019 • Mahyar Najibi, Bharat Singh, Larry S. Davis
We propose a novel approach for generating region proposals for performing face-detection.
1 code implementation • ICCV 2019 • Mahyar Najibi, Bharat Singh, Larry S. Davis
Instead of processing an entire image pyramid, AutoFocus adopts a coarse to fine approach and only processes regions which are likely to contain small objects at finer scales.
1 code implementation • 18 Jun 2018 • Zhe Wu, Navaneeth Bodla, Bharat Singh, Mahyar Najibi, Rama Chellappa, Larry S. Davis
Interestingly, we observe that after dropping 30% of the annotations (and labeling them as background), the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset.
no code implementations • CVPR 2018 • Bharat Singh, Larry S. Davis
On the COCO dataset, our single model performance is 45. 7% and an ensemble of 3 networks obtains an mAP of 48. 3%.
4 code implementations • NeurIPS 2018 • Bharat Singh, Mahyar Najibi, Larry S. Davis
Our implementation based on Faster-RCNN with a ResNet-101 backbone obtains an mAP of 47. 6% on the COCO dataset for bounding box detection and can process 5 images per second during inference with a single GPU.
Ranked #126 on
Object Detection
on COCO test-dev
no code implementations • 12 Dec 2017 • Zhe Wu, Bharat Singh, Larry S. Davis, V. S. Subrahmanian
We present a system for covert automated deception detection in real-life courtroom trial videos.
2 code implementations • CVPR 2018 • Bharat Singh, Hengduo Li, Abhishek Sharma, Larry S. Davis
Our approach is a modification of the R-FCN architecture in which position-sensitive filters are shared across different object classes for performing localization.
no code implementations • 22 Nov 2017 • Bharat Singh, Larry S. Davis
On the COCO dataset, our single model performance is 45. 7% and an ensemble of 3 networks obtains an mAP of 48. 3%.
Ranked #134 on
Object Detection
on COCO test-dev
no code implementations • ICCV 2017 • Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen
For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification.
Ranked #7 on
Action Recognition
on THUMOS’14
8 code implementations • ICCV 2017 • Navaneeth Bodla, Bharat Singh, Rama Chellappa, Larry S. Davis
To this end, we propose Soft-NMS, an algorithm which decays the detection scores of all other objects as a continuous function of their overlap with M. Hence, no object is eliminated in this process.
no code implementations • 9 Jan 2017 • Carlos Castillo, Soham De, Xintong Han, Bharat Singh, Abhay Kumar Yadav, Tom Goldstein
This work considers targeted style transfer, in which the style of a template image is used to alter only part of a target image.
no code implementations • CVPR 2017 • Seyed A. Esmaeili, Bharat Singh, Larry S. Davis
It is a fully-convolutional deep neural network, which learns specific filters for thumbnails of different sizes and aspect ratios.
no code implementations • CVPR 2016 • Bharat Singh, Tim K. Marks, Michael Jones, Oncel Tuzel, Ming Shao
We present a multi-stream bi-directional recurrent neural network for fine-grained action detection.
Action Recognition In Videos
Fine-Grained Action Detection
+2
2 code implementations • 6 May 2016 • Gavin Taylor, Ryan Burmeister, Zheng Xu, Bharat Singh, Ankit Patel, Tom Goldstein
With the growing importance of large network models and enormous training datasets, GPUs have become increasingly necessary to train neural networks.
no code implementations • 10 Dec 2015 • Xintong Han, Bharat Singh, Vlad I. Morariu, Larry S. Davis
VRFP is a real-time video retrieval framework based on short text input queries, which obtains weakly labeled training images from the web after the query is known.
no code implementations • 15 Oct 2015 • Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Goldstein, Gavin Taylor
In this paper, we attempt to overcome the two above problems by proposing an optimization method for training deep neural networks which uses learning rates which are both specific to each layer in the network and adaptive to the curvature of the function, increasing the learning rate at low curvature points.
no code implementations • ICCV 2015 • Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, Larry S. Davis
Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos.