no code implementations • NAACL 2022 • Saiteja Kosgi, Sarath Sivaprasad, Niranjan Pedanekar, Anil Nelakanti, Vineet Gandhi
We present a method to control the emotional prosody of Text to Speech (TTS) systems by using phoneme-level intermediate features (pitch, energy, and duration) as levers.
no code implementations • 4 Feb 2025 • Rohit Girmaji, Bhav Beri, Ramanathan Subramanian, Vineet Gandhi
We present EditIQ, a completely automated framework for cinematically editing scenes captured via a stationary, large field-of-view and high-resolution camera.
no code implementations • 1 Feb 2025 • Rohit Girmaji, Siddharth Jain, Bhav Beri, Sarthak Bansal, Vineet Gandhi
This paper introduces ViNet-S, a 36MB model based on the ViNet architecture with a U-Net design, featuring a lightweight decoder that significantly reduces model size and parameters without compromising performance.
no code implementations • 25 Dec 2024 • Neil Shah, Ayan Kashyap, Shirish Karande, Vineet Gandhi
Previous real-time MRI (rtMRI)-based speech synthesis models depend heavily on noisy ground-truth speech.
no code implementations • 25 Dec 2024 • Neil Shah, Shirish Karande, Vineet Gandhi
To address this issue, we focus on learning phoneme-level alignments from paired whispers and text and employ a Text-to-Speech (TTS) system to simulate the ground-truth.
no code implementations • 25 Nov 2024 • Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi
Our next innovation is TIDE, a novel training scheme with a concept saliency alignment loss that ensures model focus on the right per-concept regions and a local concept contrastive loss that promotes learning domain-invariant concept representations.
1 code implementation • 12 Nov 2024 • Kawshik Manikantan, Makarand Tapaswi, Vineet Gandhi, Shubham Toshniwal
The benchmark also consists of a curated mixture of different mention types and corresponding entities, allowing for a fine-grained analysis of model performance.
no code implementations • 26 Jul 2024 • Neil Shah, Shirish Karande, Vineet Gandhi
Moreover, we present a methodology for augmenting the existing CSTR NAM TIMIT Plus corpus, setting a benchmark with a Word Error Rate (WER) of 42. 57% to gauge the intelligibility of the synthesized speech.
1 code implementation • 20 Jun 2024 • Kawshik Manikantan, Shubham Toshniwal, Makarand Tapaswi, Vineet Gandhi
Rather than relying on this additional annotation, we propose an alternative referential task, Major Entity Identification (MEI), where we: (a) assume the target entities to be specified in the input, and (b) limit the task to only the frequent entities.
no code implementations • 16 Jun 2024 • Darshana Saravanan, Darshan Singh, Varun Gupta, Zeeshan Khan, Vineet Gandhi, Makarand Tapaswi
Compositionality is a fundamental aspect of vision-language understanding and is especially required for videos since they contain multiple entities (e. g. persons, actions, and scenes) interacting dynamically over time.
no code implementations • 7 Feb 2024 • Darshana Saravanan, Naresh Manwani, Vineet Gandhi
Noisy PLL (NPLL) relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem.
no code implementations • 27 Nov 2023 • Sudheer Achary, Rohit Girmaji, Adhiraj Anil Deshmukh, Vineet Gandhi
Eliminating time-consuming post-production processes and delivering high-quality videos in today's fast-paced digital landscape are the key advantages of real-time approaches.
no code implementations • 3 Jul 2023 • Neha Sahipjohn, Neil Shah, Vishal Tambrahalli, Vineet Gandhi
Significant progress has been made in speaker dependent Lip-to-Speech synthesis, which aims to generate speech from silent videos of talking faces.
no code implementations • 19 May 2023 • Neil Shah, Vishal Tambrahalli, Saiteja Kosgi, Niranjan Pedanekar, Vineet Gandhi
We present MParrotTTS, a unified multilingual, multi-speaker text-to-speech (TTS) synthesis model that can produce high-quality speech.
no code implementations • 1 Mar 2023 • Neil Shah, Saiteja Kosgi, Vishal Tambrahalli, Neha Sahipjohn, Niranjan Pedanekar, Vineet Gandhi
We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations.
1 code implementation • NeurIPS 2023 • Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi
We investigate the problem of reducing mistake severity for fine-grained classification.
no code implementations • 24 Sep 2022 • Kanishk Jain, Varun Chhangani, Amogh Tiwari, K. Madhava Krishna, Vineet Gandhi
We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings.
1 code implementation • 24 Dec 2021 • Nivedita Rufus, Kanishk Jain, Unni Krishnan R Nair, Vineet Gandhi, K Madhava Krishna
We introduce a new dataset, Talk2Car-RegSeg, which extends the existing Talk2car dataset with segmentation masks for the regions described by the linguistic commands.
no code implementations • 7 Nov 2021 • Sarath Sivaprasad, Saiteja Kosgi, Vineet Gandhi
The proposed TTS system can generate speech from the text in any speaker's style, with fine control of emotion.
no code implementations • 15 Oct 2021 • Sarath Sivaprasad, Akshay Goindani, Vaibhav Garg, Ritam Basu, Saiteja Kosgi, Vineet Gandhi
We find that the presence of multiple domains incentivizes domain agnostic learning and is the primary reason for generalization in Tradition DG.
1 code implementation • 24 Sep 2021 • Jeet Vora, Swetanjal Dutta, Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi
Multi-view Detection (MVD) is highly effective for occlusion reasoning in a crowded environment.
Ranked #2 on
Multiview Detection
on GMVD
no code implementations • 30 Jul 2021 • Vineet Gandhi, Jan Cech, Radu Horaud
Most of these systems suffer from the problems of noise in the range-data and resolution mismatch between the range sensor and the color cameras, since the resolution of current range sensors is much less than the resolution of color cameras.
1 code implementation • Findings (ACL) 2022 • Kanishk Jain, Vineet Gandhi
We investigate Referring Image Segmentation (RIS), which outputs a segmentation map corresponding to the natural language description.
Ranked #3 on
Referring Expression Segmentation
on ReferIt
1 code implementation • 1 Apr 2021 • Shyamgopal Karthik, Ameya Prabhu, Puneet K. Dokania, Vineet Gandhi
There has been increasing interest in building deep hierarchy-aware classifiers that aim to quantify and reduce the severity of mistakes, and not just reduce the number of errors.
no code implementations • ICLR 2021 • Shyamgopal Karthik, Ameya Prabhu, Puneet K. Dokania, Vineet Gandhi
There has been increasing interest in building deep hierarchy-aware classifiers, aiming to quantify and reduce the severity of mistakes and not just count the number of errors.
1 code implementation • 11 Dec 2020 • Samyak Jain, Pradeep Yarlagadda, Shreyank Jyoti, Shyamgopal Karthik, Ramanathan Subramanian, Vineet Gandhi
We also explore a variation of ViNet architecture by augmenting audio features into the decoder.
no code implementations • 22 Oct 2020 • K L Bhanu Moorthy, Moneish Kumar, Ramanathan Subramaniam, Vineet Gandhi
We present GAZED- eye GAZe-guided EDiting for videos captured by a solitary, static, wide-angle and high-resolution camera.
1 code implementation • 13 Sep 2020 • Nivedita Rufus, Unni Krishnan R Nair, K. Madhava Krishna, Vineet Gandhi
In this paper, we present a simple baseline for visual grounding for autonomous driving which outperforms the state of the art methods, while retaining minimal design choices.
Ranked #6 on
Referring Expression Comprehension
on Talk2Car
no code implementations • 9 Jun 2020 • Sarath Sivaprasad, Ankur Singh, Naresh Manwani, Vineet Gandhi
In this paper, we investigate a constrained formulation of neural networks where the output is a convex function of the input.
no code implementations • 4 Jun 2020 • Shyamgopal Karthik, Ameya Prabhu, Vineet Gandhi
Multi-object tracking has seen a lot of progress recently, albeit with substantial annotation costs for developing better and larger labeled datasets.
2 code implementations • 12 Mar 2020 • Aasheesh Singh, Aditya Kamireddypalli, Vineet Gandhi, K. Madhava Krishna
In this paper, we present a method to reliably detect such obstacles through a multi-modal framework of sparse LiDAR(VLP-16) and Monocular vision.
1 code implementation • 10 Mar 2020 • Navyasri Reddy, Samyak Jain, Pradeep Yarlagadda, Vineet Gandhi
As a result, we propose two novel end-to-end architectures called SimpleNet and MDNSal, which are neater, minimal, more interpretable and achieve state of the art performance on public saliency benchmarks.
no code implementations • 11 Dec 2019 • Sudheer Achary, K L Bhanu Moorthy, Syed Ashar Javed, Nikita Shravan, Vineet Gandhi, Anoop Namboodiri
Autonomous camera systems are often subjected to an optimization/filtering operation to smoothen and stabilize the rough trajectory estimates.
no code implementations • 27 Oct 2019 • Shyamgopal Karthik, Abhinav Moudgil, Vineet Gandhi
Recent works have proposed several long term tracking benchmarks and highlight the importance of moving towards long-duration tracking to bridge the gap with application requirements.
1 code implementation • 3 Dec 2018 • Aryaman Gupta, Kalpit Thakkar, Vineet Gandhi, P. J. Narayanan
Monocular head pose estimation requires learning a model that computes the intrinsic Euler angles for pose (yaw, pitch, roll) from an input image of human face.
Ranked #2 on
Head Pose Estimation
on AFLW
1 code implementation • ICASSP 2018 • Vatsal Shah, Vineet Gandhi
Uneven illumination and shadows in document images cause a challenge for digitization applications and automated workflows.
no code implementations • 17 Mar 2018 • Krishnam Gupta, Syed Ashar Javed, Vineet Gandhi, K. Madhava Krishna
We present here, a novel network architecture called MergeNet for discovering small obstacles for on-road scenes in the context of autonomous driving.
no code implementations • 17 Mar 2018 • Syed Ashar Javed, Shreyas Saxena, Vineet Gandhi
Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities.
1 code implementation • 4 Dec 2017 • Abhinav Moudgil, Vineet Gandhi
We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for single object tracking.
no code implementations • 4 Mar 2017 • Rahul Anand Sharma, Bharath Bhat, Vineet Gandhi, C. V. Jawahar
The proposed method is fully automatic in contrast to the current state of the art which requires manual initialization of point correspondences between the image and the static model.
1 code implementation • 30 Aug 2015 • Remi Ronfard, Vineet Gandhi, Laurent Boiron, Vaishnavi Ameya Murukutla
The prose storyboard language is a formal language for describing movies shot by shot, where each shot is described with a unique sentence.
Graphics
no code implementations • CVPR 2013 • Vineet Gandhi, Remi Ronfard
We introduce a generative model for learning person and costume specific detectors from labeled examples.