Anomaly detection in Minimally-Invasive Surgery (MIS) traditionally requires a human expert monitoring the procedure from a console.
Long-term complex activity recognition and localisation can be crucial for the decision-making process of several autonomous systems, such as smart cars and surgical robots.
no code implementations • 7 Apr 2021 • Vivek Singh Bawa, Gurkirt Singh, Francis KapingA, Inna Skarga-Bandurova, Elettra Oleari, Alice Leporini, Carmela Landolfo, Pengfei Zhao, Xi Xiang, Gongning Luo, Kuanquan Wang, Liangzhi Li, Bowen Wang, Shang Zhao, Li Li, Armando Stabile, Francesco Setti, Riccardo Muradore, Fabio Cuzzolin
For an autonomous robotic system, monitoring surgeon actions and assisting the main surgeon during a procedure can be very challenging.
2 code implementations • 1 Apr 2021 • Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu, Antonio Carta, Gabriele Graffieti, Tyler L. Hayes, Matthias De Lange, Marc Masana, Jary Pomponi, Gido van de Ven, Martin Mundt, Qi She, Keiland Cooper, Jeremy Forest, Eden Belouadah, Simone Calderara, German I. Parisi, Fabio Cuzzolin, Andreas Tolias, Simone Scardapane, Luca Antiga, Subutai Amhad, Adrian Popescu, Christopher Kanan, Joost Van de Weijer, Tinne Tuytelaars, Davide Bacciu, Davide Maltoni
Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning.
2 code implementations • 23 Feb 2021 • Gurkirt Singh, Stephen Akrigg, Manuele Di Maio, Valentina Fontana, Reza Javanmard Alitappeh, Suman Saha, Kossar Jeddisaravi, Farzad Yousefi, Jacob Culley, Tom Nicholson, Jordan Omokeowa, Salman Khan, Stanislao Grazioso, Andrew Bradley, Giuseppe Di Gironimo, Fabio Cuzzolin
Humans approach driving in a holistic fashion which entails, in particular, understanding road events and their evolution.
Matching articulated shapes represented by voxel-sets reduces to maximal sub-graph isomorphism when each set is described by a weighted graph.
no code implementations • 12 Jun 2020 • Vivek Singh Bawa, Gurkirt Singh, Francis KapingA, InnaSkarga-Bandurova, Alice Leporini, Carmela Landolfo, Armando Stabile, Francesco Setti, Riccardo Muradore, Elettra Oleari, Fabio Cuzzolin
In this work, we take aim towards increasing the effectiveness of surgical assistant robots.
This is achieved by augmenting the previous Action Micro-Tube (AMTnet) action detection framework in three distinct ways: by adding a parallel motion stIn this paper, we propose a new deep neural network architecture for online action detection, termed ream to the original appearance one in AMTnet; (2) in opposition to state-of-the-art action detectors which train appearance and motion streams separately, and use a test time late fusion scheme to fuse RGB and flow cues, by jointly training both streams in an end-to-end fashion and merging RGB and optical flow features at training time; (3) by introducing an online action tube generation algorithm which works at video-level, and in real-time (when exploiting only appearance features).
With the rapid growth of the applications of machine learning (ML) and other artificial intelligence (AI) techniques, adequate testing has become a necessity to ensure their quality.
Enabling computational systems with the ability to localize actions in video-based content has manifold applications.
Recently, three dimensional (3D) convolutional neural networks (CNNs) have emerged as dominant methods to capture spatiotemporal representations in videos, by adding to pre-existing 2D CNNs a third, temporal dimension.
In this work, we present a method to predict an entire `action tube' (a set of temporally linked bounding boxes) in a trimmed video just by observing a smaller subset of it.
The notion of belief likelihood function of repeated trials is introduced, whenever the uncertainty for individual trials is encoded by a belief measure (a finite random set).
At training time, transitions are specific to cell locations of the feature maps, so that a sparse (efficient) transition matrix is used to train the network.
We present the new Road Event and Activity Detection (READ) dataset, designed and created from an autonomous vehicle perspective to take action detection challenges to autonomous driving.
Current state-of-the-art human action recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame.
As such, our 3D-RPN net is able to effectively encode the temporal aspect of actions by purely exploiting appearance, as opposed to methods which heavily rely on expensive flow maps.
In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging action videos in which different action categories occur concurrently.
To the best of our knowledge, ours is the first real-time (up to 40fps) system able to perform online S/T action localisation and early action prediction on the untrimmed videos of UCF101-24.
In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap.
We propose a simple, yet effective, method for the temporal detection of activities in temporally untrimmed videos with the help of untrimmed classification.
Recognising human activities from streaming videos poses unique challenges to learning algorithms: predictive models need to be scalable, incrementally trainable, and must remain bounded in size even when the data stream is arbitrarily long.
Consistent belief functions represent collections of coherent or non-contradictory pieces of evidence, but most of all they are the counterparts of consistent knowledge bases in belief calculus.
The recent trend in action recognition is towards larger datasets, an increasing number of action classes and larger visual vocabularies.
In an unsupervised context, i. e., no prior model of the moving object(s) is available, such a structure has to be learned from the data in a bottom-up fashion.