To address this challenge, we propose a multimodal teacher network based on a cross-modality attention-based fusion strategy to improve the segmentation accuracy by exploiting data from multiple modes.
With advances in data-driven machine learning research, a wide variety of prediction models have been proposed to capture spatio-temporal features for the analysis of video streams.
This paper presents a novel lightweight COVID-19 diagnosis framework using CT scans.
Machine learning-based medical anomaly detection is an important problem that has been extensively studied.
Gesture recognition is a much studied research area which has myriad real-world applications including robotics and human-machine interaction.
Automating the analysis of imagery of the Gastrointestinal (GI) tract captured during endoscopy procedures has substantial potential benefits for patients, as it can provide diagnostic support to medical practitioners and reduce mistakes via human error.
The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video.
Inspired by human neurological structures for action anticipation, we present an action anticipation model that enables the prediction of plausible future actions by forecasting both the visual and temporal future.
In this paper we address the problem of continuous fine-grained action segmentation, in which multiple actions are present in an unsegmented video stream.
The goal of both GANs is to generate similar `action codes', a vector representation of the current action.
The generator is fed with person-level and scene-level features that are mapped temporally through LSTM networks.
Our contribution in this paper is a deep fusion framework that more effectively exploits spatial features from CNNs with temporal features from LSTM models.