1 code implementation • ECCV 2020 • Ayushi Dutta, Yashaswi Verma, C. V. Jawahar
Additionally, it provides a new perspecitve of looking at an unordered set of labels as equivalent to a collection of different permutations (sequences) of those labels, thus naturally aligning with the image annotation task.
no code implementations • 26 Sep 2023 • Thrupthi Ann John, Vineeth N Balasubramanian, C. V. Jawahar
Although current deep models for face tasks surpass human performance on some benchmarks, we do not understand how they work.
no code implementations • 4 Sep 2023 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively.
no code implementations • 23 Aug 2023 • Nitin Nilesh, Tushar Sharma, Anurag Ghosh, C. V. Jawahar
In this work, we propose an end-to-end framework for player movement analysis for badminton matches on live broadcast match videos.
no code implementations • 8 Jul 2023 • George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar
Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness.
no code implementations • 5 Mar 2023 • Varun Gupta, Anbumani Subramanian, C. V. Jawahar, Rohit Saluja
MTSVD is challenging compared to the previous works in two aspects i) The traffic signs are generally not present in the vicinity of their cues, ii) The traffic signs cues are diverse and unique.
1 code implementation • 30 Dec 2022 • Prafful Kumar Khoba, Chirag Parikh, Rohit Saluja, Ravi Kiran Sarvadevabhatla, C. V. Jawahar
Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task.
no code implementations • 17 Dec 2022 • Ajoy Mondal, Rohit Saluja, C. V. Jawahar
The service providers encourage the users who provide data where the OCR model fails by rewarding them based on data complexity, readability, and available budget.
Handwritten Text Recognition
Optical Character Recognition (OCR)
1 code implementation • 15 Dec 2022 • Ajoy Mondal, C. V. Jawahar
We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts.
no code implementations • 2 Dec 2022 • Riya Gupta, C. V. Jawahar
Extracting the relevant information out of a large number of documents is a challenging and tedious task.
no code implementations • 10 Nov 2022 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.
no code implementations • 29 Oct 2022 • Darshan Singh S, Anchit Gupta, C. V. Jawahar, Makarand Tapaswi
We formulate lecture segmentation as an unsupervised task that leverages visual, textual, and OCR cues from the lecture, while clip representations are fine-tuned on a pretext self-supervised task of matching the narration with the temporally aligned visual content.
1 code implementation • 29 Oct 2022 • Bipasha Sen, Aditya Agarwal, Vinay P Namboodiri, C. V. Jawahar
In this work, we evaluate the space learned by INR-V on diverse generative tasks such as video interpolation, novel video generation, video inversion, and video inpainting against the existing baselines.
Ranked #1 on
Video Inpainting
on How2Sign
no code implementations • 23 Oct 2022 • Shubham Dokania, A. H. Abdul Hafez, Anbumani Subramanian, Manmohan Chandraker, C. V. Jawahar
Autonomous driving and assistance systems rely on annotated data from traffic and road scenarios to model and learn the various object relations in complex real-world scenarios.
no code implementations • 19 Oct 2022 • Zeeshan Khan, C. V. Jawahar, Makarand Tapaswi
Recently, Video Situation Recognition (VidSitu) is framed as a task for structured prediction of multiple events, their relationships, and actions and various verb-role pairs attached to descriptive entities.
no code implementations • 1 Sep 2022 • Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar
With the help of multiple powerful discriminators that guide the training process, our generator learns to synthesize speech sequences in any voice for the lip movements of any person.
1 code implementation • 21 Aug 2022 • Aditya Agarwal, Bipasha Sen, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
To tackle this challenge, we introduce video-to-video (V2V) face-swapping, a novel task of face-swapping that can preserve (1) the identity and expressions of the source (actor) face video and (2) the background and pose of the target (double) video.
1 code implementation • 17 Aug 2022 • Sindhu B Hegde, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar
We show that when we process this $8\times8$ video with the right set of audio and image priors, we can obtain a full-length, $256\times256$ video.
1 code implementation • 16 Aug 2022 • Shubham Dokania, Anbumani Subramanian, Manmohan Chandraker, C. V. Jawahar
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation, mimicking real scene properties with high-fidelity, along with mechanisms to diversify samples in a physically meaningful way.
1 code implementation • 22 Jul 2022 • Siddhant Bansal, Chetan Arora, C. V. Jawahar
Instead, we propose to use the signal provided by the temporal correspondences between key-steps across videos.
1 code implementation • 18 Apr 2022 • Aman Goyal, Dev Agarwal, Anbumani Subramanian, C. V. Jawahar, Ravi Kiran Sarvadevabhatla, Rohit Saluja
In many Asian countries with unconstrained road traffic conditions, driving violations such as not wearing helmets and triple-riding are a significant source of fatalities involving motorcycles.
no code implementations • 21 Jan 2022 • Jobin K. V., Ajoy Mondal, C. V. Jawahar
With this information, we build a Classroom Slide Narration System (CSNS) to help VI students understand the slide content.
1 code implementation • 17 Jan 2022 • Arpit Bahety, Rohit Saluja, Ravi Kiran Sarvadevabhatla, Anbumani Subramanian, C. V. Jawahar
We obtain TCDCA of 96. 77% on the test videos, with a remarkable improvement of 22. 58% over baseline, and demonstrate that our counting module's performance is close to human level.
no code implementations • 10 Jan 2022 • Sanjana Gunna, Rohit Saluja, C. V. Jawahar
WRRs improve over the baselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and the Gujarati and Tamil datasets.
1 code implementation • 10 Jan 2022 • Sanjana Gunna, Rohit Saluja, C. V. Jawahar
Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images.
1 code implementation • WACV 2022 • Vaishnavi Khindkar, Chetan Arora, Vineeth N Balasubramanian, Anbumani Subramanian, C. V. Jawahar
Qualitative results demonstrate the ability of ILLUME to attend important object instances required for alignment.
no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.
1 code implementation • 23 Oct 2021 • Prachi Garg, Rohit Saluja, Vineeth N Balasubramanian, Chetan Arora, Anbumani Subramanian, C. V. Jawahar
Recent efforts in multi-domain learning for semantic segmentation attempt to learn multiple geographical datasets in a universal, joint model.
no code implementations • 16 Oct 2021 • Anchit Gupta, Faizan Farooq Khan, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C. V. Jawahar
Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality.
3 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
no code implementations • 11 Sep 2021 • Harish Rithish, Raghava Modhugu, Ranjith Reddy, Rohit Saluja, C. V. Jawahar
Conventional approaches for addressing road safety rely on manual interventions or immobile CCTV infrastructure.
2 code implementations • Journal of Chemical Information and Modeling 2021 • Rishal Aggarwal, Akash Gupta, Vineeth Chelur, C. V. Jawahar, and U. Deva Priyakumar
A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site.
no code implementations • 6 Aug 2021 • Bhavani Sambaturu, Ashutosh Gupta, C. V. Jawahar, Chetan Arora
We report a time saving of 2. 8, 3. 0, 1. 9, 4. 4, and 8. 6 fold compared to other interactive segmentation techniques.
1 code implementation • 18 Mar 2021 • Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar
In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).
Key Information Extraction
Optical Character Recognition (OCR)
no code implementations • ICPR 2021 • Avijit Dasgupta, C. V. Jawahar, Karteek Alahari
Existing approaches decompose this task into feature learning and relational reasoning.
no code implementations • 26 Dec 2020 • Aditya Bharti, N. B. Vineeth, C. V. Jawahar
Few-shot learners aim to recognize new categories given only a small number of training samples.
1 code implementation • 20 Dec 2020 • Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
In this work, we re-think the task of speech enhancement in unconstrained real-world environments.
Ranked #1 on
Speech Denoising
on LRS3+VGGSound
1 code implementation • 27 Oct 2020 • Siddhant Bansal, Praveen Krishnan, C. V. Jawahar
We propose a novel scheme for improving the word recognition accuracy using word image embeddings.
1 code implementation • ECCV 2020 • Sachin Raja, Ajoy Mondal, C. V. Jawahar
We present an approach for table structure recognition that combines cell detection and interaction modules to localize the cells and predict their row and column associations with other detected cells.
Ranked #8 on
Table Recognition
on PubTabNet
1 code implementation • 25 Aug 2020 • Ranajit Saha, Ajoy Mondal, C. V. Jawahar
Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document.
2 code implementations • 25 Aug 2020 • Madhav Agarwal, Ajoy Mondal, C. V. Jawahar
Localizing page elements/objects such as tables, figures, equations, etc.
Ranked #1 on
Table Detection
on ICDAR2013
4 code implementations • 23 Aug 2020 • K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
Ranked #1 on
Unconstrained Lip-synchronization
on LRS3
(using extra training data)
no code implementations • 20 Aug 2020 • Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar
For the task 1 a new dataset is introduced comprising 50, 000 questions-answer(s) pairs defined over 12, 767 document images.
no code implementations • 11 Aug 2020 • Jerin Philip, Shashank Siripragada, Vinay P. Namboodiri, C. V. Jawahar
Through this paper, we provide and analyse an automated framework to obtain such a corpus for Indian language neural machine translation (NMT) systems.
1 code implementation • 7 Aug 2020 • Ajoy Mondal, C. V. Jawahar
Reading of mathematical expression or equation in the document images is very challenging due to the large variability of mathematical symbols and expressions.
no code implementations • 6 Aug 2020 • Ajoy Mondal, Peter Lipps, C. V. Jawahar
This dataset, IIIT-AR-13k, is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports.
no code implementations • ECCV 2020 • Aditya Arun, C. V. Jawahar, M. Pawan Kumar
Recent approaches for weakly supervised instance segmentations depend on two components: (i) a pseudo label generation model that provides instances which are consistent with a given annotation; and (ii) an instance segmentation model, which is trained in a supervised manner using the pseudo labels as ground-truth.
Ranked #4 on
Image-level Supervised Instance Segmentation
on PASCAL VOC 2012 val
(using extra training data)
Image-level Supervised Instance Segmentation
Pseudo Label
+2
2 code implementations • LREC 2020 • Shashank Siripragada, Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar
We present sentence aligned parallel corpora across 10 Indian Languages - Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English - many of which are categorized as low resource.
1 code implementation • 1 Jul 2020 • Siddhant Bansal, Praveen Krishnan, C. V. Jawahar
Recognition and retrieval of textual content from the large document collections have been a powerful use case for the document image analysis community.
3 code implementations • 1 Jul 2020 • Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
The dataset consists of 50, 000 questions defined on 12, 000+ document images.
Ranked #1 on
Visual Question Answering (VQA)
on DocVQA test
no code implementations • 19 May 2020 • Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C. V. Jawahar
State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets.
1 code implementation • CVPR 2020 • K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.
Ranked #1 on
Lip Reading
on LRW
no code implementations • LREC 2020 • Nimisha Srivastava, Rudrabha Mukhopadhyay, Prajwal K R, C. V. Jawahar
We believe that one of the major reasons for this is the lack of large, publicly available text-to-speech corpora in these languages that are suitable for training neural text-to-speech systems.
1 code implementation • ACM Multimedia, 2019 2019 • Prajwal K R, Rudrabha Mukhopadhyay, Jerin Philip, Abhishek Jha, Vinay Namboodiri, C. V. Jawahar
As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.
Ranked #1 on
Talking Face Generation
on LRW
(using extra training data)
no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
no code implementations • 18 Nov 2019 • Vishnu Sashank Dorbala, A. H. Abdul Hafez, C. V. Jawahar
For an autonomous corridor following task where the environment is continuously changing, several forms of environmental noise prevent an automated feature extraction procedure from performing reliably.
no code implementations • WS 2019 • Jerin Philip, Shashank Siripragada, Upendra Kumar, Vinay Namboodiri, C. V. Jawahar
This paper describes the Neural Machine Translation systems used by IIIT Hyderabad (CVIT-MT) for the translation tasks part of WAT-2019.
no code implementations • 29 Jul 2019 • Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar
We present a simple, yet effective, Neural Machine Translation system for Indian languages.
no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.
3 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.
no code implementations • 28 May 2019 • Deepayan Das, Jerin Philip, Minesh Mathew, C. V. Jawahar
Word error rate of an ocr is often higher than its character error rate.
no code implementations • PACLIC 2018 • Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar
This document describes the machine translation system used in the submissions of IIIT-Hyderabad CVIT-MT for the WAT-2018 English-Hindi translation task.
no code implementations • 31 Jan 2019 • Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.
1 code implementation • 9 Jan 2019 • Thrupthi Ann John, Isha Dua, Vineeth N. Balasubramanian, C. V. Jawahar
Do we know what the different filters of a face network represent?
1 code implementation • 26 Nov 2018 • Sudhir Yarram, Girish Varma, C. V. Jawahar
Road networks in cities are massive and is a critical component of mobility.
1 code implementation • ICCV 2019 • Tarun Kalluri, Girish Varma, Manmohan Chandraker, C. V. Jawahar
In recent years, the need for semantic segmentation has arisen across several different applications and environments.
Ranked #27 on
Semantic Segmentation
on DensePASS
(using extra training data)
Semi-Supervised Semantic Segmentation
Unsupervised Domain Adaptation
2 code implementations • 26 Nov 2018 • Girish Varma, Anbumani Subramanian, Anoop Namboodiri, Manmohan Chandraker, C. V. Jawahar
It also reflects label distributions of road scenes significantly different from existing datasets, with most classes displaying greater within-class diversity.
no code implementations • CVPR 2019 • Aditya Arun, C. V. Jawahar, M. Pawan Kumar
This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution.
no code implementations • 11 Nov 2018 • Soham Saha, Girish Varma, C. V. Jawahar
Our method improves the median error in indoor as well as outdoor localization datasets compared to the previous best deep learning model known as PoseNet (with geometric re-projection loss) using the same feature extractor.
1 code implementation • 20 Aug 2018 • Soham Saha, Girish Varma, C. V. Jawahar
We propose an alternate architecture to the classifier network called the Latent Hierarchy (LH) Classifier and an end to end learned Class2Str mapping which discovers a latent hierarchy of the classes.
no code implementations • 1 Aug 2018 • A. H. Abdul Hafez, Nakul Agarwal, C. V. Jawahar
This problem is solved by finding the maximum flow in a directed graph flow-network, whose vertices represent the matches between frames in the test and reference sequences.
no code implementations • 24 Jul 2018 • Aditya Arun, C. V. Jawahar, M. Pawan Kumar
In order to avoid the high cost of full supervision, we propose to use a diverse data set, which consists of two types of annotations: (i) a small number of images are labeled using the expensive ground-truth pose; and (ii) other images are labeled using the inexpensive action label.
1 code implementation • 4 Jul 2018 • Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.
no code implementations • 22 Jun 2018 • Nikitha Vallurupalli, Sriharsha Annamaneni, Girish Varma, C. V. Jawahar, Manu Mathew, Soyeb Nagori
We study the effectiveness of these techniques on a real-time semantic segmentation architecture like ERFNet for improving run time by over 5X.
1 code implementation • 3 Mar 2018 • Samyak Datta, Gaurav Sharma, C. V. Jawahar
Although faces extracted from videos have a lower spatial resolution than those which are available as part of standard supervised face datasets such as LFW and CASIA-WebFace, the former represent a much more realistic setting, e. g. in surveillance scenarios where most of the faces detected are very small.
no code implementations • 17 Feb 2018 • Praveen Krishnan, C. V. Jawahar
We present a framework for learning an efficient holistic representation for handwritten word images.
no code implementations • 4 Jan 2018 • Anurag Ghosh, C. V. Jawahar
In this paper, we demonstrate a score based indexing approach for tennis videos.
no code implementations • 23 Dec 2017 • Anurag Ghosh, Suriya Singh, C. V. Jawahar
Sports video data is recorded for nearly every major tournament but remains archived and inaccessible to large scale data mining and analytics.
no code implementations • 7 Nov 2017 • Mohit Jain, Minesh Mathew, C. V. Jawahar
For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data.
no code implementations • 7 Nov 2017 • Viral Parekh, Ramanathan Subramanian, Dipanjan Roy, C. V. Jawahar
The success of deep learning in computer vision has greatly increased the need for annotated image datasets.
no code implementations • CVPR 2017 • Vijay Kumar, Anoop Namboodiri, Manohar Paluri, C. V. Jawahar
Person recognition methods that use multiple body regions have shown significant improvements over traditional face-based recognition.
no code implementations • CVPR 2017 • Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.
no code implementations • 4 Mar 2017 • Rahul Anand Sharma, Bharath Bhat, Vineet Gandhi, C. V. Jawahar
The proposed method is fully automatic in contrast to the current state of the art which requires manual initialization of point correspondences between the image and the static model.
no code implementations • WS 2016 • Priyam Bakliwal, Devadath V V, C. V. Jawahar
Multilingual language processing tasks like statistical machine translation and cross language information retrieval rely mainly on availability of accurate parallel corpora.
1 code implementation • 15 Aug 2016 • Praveen Krishnan, C. V. Jawahar
Generating synthetic images is an art which emulates the natural process of image generation in a closest possible manner.
no code implementations • CVPR 2016 • Suriya Singh, Chetan Arora, C. V. Jawahar
It can also be trained from relatively small number of labeled egocentric videos that are available.
no code implementations • 19 May 2016 • Praveen Krishnan, C. V. Jawahar
We address the problem of predicting similarity between a pair of handwritten document images written by different individuals.
no code implementations • CVPR 2018 • Pritish Mohapatra, Michal Rolinek, C. V. Jawahar, Vladimir Kolmogorov, M. Pawan Kumar
We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions.
no code implementations • 7 Apr 2016 • Suriya Singh, Chetan Arora, C. V. Jawahar
Objects present in the scene and hand gestures of the wearer are the most important cues for first person action recognition but are difficult to segment and recognize in an egocentric video.
no code implementations • 13 Jan 2016 • Anand Mishra, Karteek Alahari, C. V. Jawahar
We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them.
1 code implementation • ICCV 2015 • Viresh Ranjan, Nikhil Rasiwasia, C. V. Jawahar
In this work, we address the problem of cross-modal retrieval in presence of multi-label annotations.
no code implementations • ICCV 2015 • Vijay Kumar, Anoop Namboodiri, C. V. Jawahar
Contrary to traditional approaches that model face variations from a large and diverse set of training examples, exemplar-based approaches use a collection of discriminatively trained exemplars for detection.
no code implementations • 26 Nov 2015 • Mohak Sukhwani, C. V. Jawahar
In this work, we attempt to describe videos from a specific domain - broadcast videos of lawn tennis matches.
no code implementations • NeurIPS 2014 • Pritish Mohapatra, C. V. Jawahar, M. Pawan Kumar
The accuracy of information retrieval systems is often measured using average precision (AP).
no code implementations • CVPR 2014 • Aseem Behl, C. V. Jawahar, M. Pawan Kumar
The performance of binary classification tasks, such as action classification and object detection, is often measured in terms of the average precision (AP).
no code implementations • CVPR 2014 • Rashmi Tonge, Subhransu Maji, C. V. Jawahar
We propose an approach for segmenting the individual buildings in typical skyline images.
no code implementations • CVPR 2014 • Ramachandruni N. Sandeep, Yashaswi Verma, C. V. Jawahar
The notion of relative attributes as introduced by Parikh and Grauman (ICCV, 2011) provides an appealing way of comparing two images based on their visual properties (or attributes) such as "smiling" for face images, "naturalness" for outdoor images, etc.
no code implementations • CVPR 2013 • Mayank Juneja, Andrea Vedaldi, C. V. Jawahar, Andrew Zisserman
The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images.
no code implementations • CVPR 2013 • Siddhartha Chandra, Shailesh Kumar, C. V. Jawahar
In this paper, we describe a feature learning scheme for natural images.