Linearly consuming (via scrolling or navigation through default table of content) these documents is time-consuming and challenging.
Chart Question Answering (CQA) is the task of answering natural language questions about visualisations in the chart image.
SALAD has three key benefits: (i) it is task-agnostic, and can be applied across various visual tasks such as classification, segmentation and detection; (ii) it can handle shifts in output label space from the pre-trained source network to the target domain; (iii) it does not require access to source data for adaptation.
For videos captured in the wild, we perform a user study to demonstrate the preference for our method in comparison to state-of-the-art approaches.
We propose TALISMAN, a novel framework for Targeted Active Learning or object detectIon with rare slices using Submodular MutuAl iNformation.
We propose novel rewards to account for class imbalance and user feedback in the annotation interface, to improve the active learning method.
LEAF-QA being constructed from real-world sources, requires a novel architecture to enable question answering.
With the explosion of video content on the Internet, there is a need for research on methods for video analysis which take human cognition into account.
Recognition of low resolution face images is a challenging problem in many practical face recognition systems.
Many existing recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time.