Precise boundary annotations of image regions can be crucial for downstream applications which rely on region-class semantics.
Handwritten documents are often characterized by dense and uneven layout.
We analyze the performance of representative crowd counting approaches across standard datasets at per strata level and in aggregate.
State of the art architectures for untrimmed video Temporal Action Localization (TAL) have only considered RGB and Flow modalities, leaving the information-rich audio modality totally unexploited.
Ranked #1 on Temporal Action Localization on THUMOS'14
Given a monocular colour image of a warehouse rack, we aim to predict the bird's-eye view layout for each shelf in the rack, which we term as multi-layer layout prediction.
The lack of fine-grained joints (facial joints, hand fingers) is a fundamental performance bottleneck for state of the art skeleton action recognition models.
Ranked #1 on Skeleton Based Action Recognition on NTU60-X
We deploy SynSE for the task of skeleton-based action sequence recognition.
Ranked #1 on Zero Shot Skeletal Action Recognition on NTU RGB+D
In particular, our integration of VPR with SLAM by leveraging the robustness of deep-learned features and our homography-based extreme viewpoint invariance significantly boosts the performance of VPR, feature correspondence, and pose graph submodules of the SLAM pipeline.
To study skeleton-action recognition in the wild, we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset.
We propose OPAL-Net, a novel hierarchical architecture for part-based layout generation of objects from multiple categories using a single unified model.
At the intermediate level, the map is represented as a Manhattan Graph where the nodes and edges are characterized by Manhattan properties and as a Pose Graph at the lower-most level of detail.
To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts.
Therefore, target identifications by operator in a subset of cameras cannot be utilized to improve ranking of the target in remaining set of network cameras.
Similarly, performance on multi-disciplinary tasks such as Visual Question Answering (VQA) is considered a marker for gauging progress in Computer Vision.
We propose SketchParse, the first deep-network architecture for fully automatic parsing of freehand object sketches.
A class of recent approaches for generating images, called Generative Adversarial Networks (GAN), have been used to generate impressively realistic images of objects, bedrooms, handwritten digits and a variety of other image modalities.
In this paper, we analyze the results of a free-viewing gaze fixation study conducted on 3904 freehand sketches distributed across 160 object categories.
Our results show that the proposed benchmarking procedure enables additional differentiation among state-of-the-art object classifiers in terms of their ability to handle missing content and insufficient object detail.
In our work, we propose a recurrent neural network architecture for sketch object recognition which exploits the long-term sequential and structural regularities in stroke data in a scalable manner.
Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e. g. photos only, sketches only).
With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective.
Studies from neuroscience show that part-mapping computations are employed by human visual system in the process of object recognition.
With a view to provide a user-friendly interface for designing, training and developing deep learning frameworks, we have developed Expresso, a GUI tool written in Python.
Therefore, analyzing such sparse sketches can aid our understanding of the neuro-cognitive processes involved in visual representation and recognition.