A novel temporal attention mechanism further processes the local geometric information in a global context across consecutive images.
Accurate and reliable localization is a fundamental requirement for autonomous vehicles to use map information in higher-level tasks such as navigation or planning.
Estimating the uncertainty of a neural network plays a fundamental role in safety-critical settings.
We propose a method to identify features with predictive information in the input domain.
Existing automatic and interactive segmentation models for medical images only use data from a single time point (static).
The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin and can be employed in 3D reconstruction pipelines.
In this work, we present MetaMedSeg, a gradient-based meta-learning algorithm that redefines the meta-learning task for the volumetric medical data with the goal to capture the variety between the slices.
Sickle cell disease (SCD) is a severe genetic hemoglobin disorder that results in premature destruction of red blood cells.
Scene graphs are representations of a scene, composed of objects (nodes) and inter-object relationships (edges), proven to be particularly suited for this task, as they allow for semantic control on the generated content.
Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (e. g. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem.
Ranked #1 on 6D Pose Estimation using RGB on Occlusion LineMOD
Scene graphs, composed of nodes as objects and directed-edges as relationships among objects, offer an alternative representation of a scene that is more semantically grounded than images.
no code implementations • 10 Aug 2021 • Markus Krönke, Christine Eilers, Desislava Dimova, Melanie Köhler, Gabriel Buschner, Lilit Mirzojan, Lemonia Konstantinidou, Marcus R. Makowski, James Nagarajah, Nassir Navab, Wolfgang Weber, Thomas Wendler
Conclusion: Tracked 3D ultrasound combined with a CNN segmentation significantly reduces interobserver variability in thyroid volumetry and increases the accuracy of the measurements with shorter acquisition times.
While self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches, violations of the static world assumption can still lead to erroneous depth predictions of traffic participants, posing a potential safety issue.
no code implementations • 29 Jul 2021 • Matthias Keicher, Hendrik Burwinkel, David Bani-Harouni, Magdalini Paschali, Tobias Czempiel, Egon Burian, Marcus R. Makowski, Rickmer Braren, Nassir Navab, Thomas Wendler
Specifically, we introduce a multimodal similarity metric to build a population graph for clustering patients and an image-based end-to-end Graph Attention Network to process this graph and predict the COVID-19 patient outcomes: admission to ICU, need for ventilation and mortality.
Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets.
We then use MSSG to introduce a dynamically generated graphical user interface tool for surgical procedure analysis which could be used for many applications including process optimization, OR design and automatic report generation.
In this work, we introduce Deep Direct Volume Rendering (DeepDVR), a generalization of DVR that allows for the integration of deep neural networks into the DVR algorithm.
Medical Ultrasound (US), despite its wide use, is characterized by artifacts and operator dependency.
The soft pseudo-labels are then used to train a deep student network for disease prediction of unseen test data for which the graph modality is unavailable.
We present our findings using publicly available chest pathologies (CheXpert, NIH ChestX-ray8) and COVID-19 datasets (BrixIA, and COVID-19 chest X-ray segmentation dataset).
Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays.
Is critical input information encoded in specific sparse pathways within the neural network?
The main novelty lies in the interpretable attention module (IAM), which directly operates on multi-modal features.
Scene graphs are a compact and explicit representation successfully used in a variety of 2D scene understanding tasks.
Ranked #1 on Scene Graph Generation on 3R-Scan
Disentangled representations can be useful in many downstream tasks, help to make deep learning models more interpretable, and allow for control over features of synthetically generated images that can be useful in training other models that require a large number of labelled or unlabelled data.
Hereditary hemolytic anemias are genetic disorders that affect the shape and density of red blood cells.
The method leverages the availability of labelled data in a different domain.
Chest computed tomography (CT) has played an essential diagnostic role in assessing patients with COVID-19 by showing disease-specific image features such as ground-glass opacity and consolidation.
1 code implementation • 12 Mar 2021 • Christina Bukas, Bailiang Jian, Luis F. Rodriguez Venegas, Francesca De Benetti, Sebastian Ruehling, Anjany Sekuboyina, Jens Gempt, Jan S. Kirschke, Marie Piraud, Johannes Oberreuter, Nassir Navab, Thomas Wendler
The framework uses the patient CT scan and the fractured vertebra label to build a virtual healthy spine using a high-level approach.
Federated learning has been recently introduced to train machine learning models in a privacy-preserved distributed fashion demanding annotated data at the clients, which is usually expensive and not available, especially in the medical field.
In this paper we introduce OperA, a transformer-based model that accurately predicts surgical phases from long video sequences.
Due to the nature of such medical datasets, the class imbalance is a familiar issue in the field of disease prediction.
For the former we contributed our own dataset composed of five indoor scenes where it is unavoidable to capture images corresponding to views that are hard to uniquely identify.
In this context, we proposed a segmentation refinement method based on uncertainty analysis and graph convolutional networks.
In this work, we empirically show that two approaches for handling the gradient information, namely positive aggregation, and positive propagation, break these methods.
First, a neural network is trained once to detect a set of anatomical landmarks on simulated X-rays.
To address these issues, recently, unsupervised deep anomaly detection methods that train the model on large-sized normal scans and detect abnormal scans by calculating reconstruction error have been reported.
no code implementations • 30 Oct 2020 • Lena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya, Keno März, Toby Collins, Anand Malpani, Johannes Fallert, Hubertus Feussner, Stamatia Giannarou, Pietro Mascagni, Hirenkumar Nakawala, Adrian Park, Carla Pugh, Danail Stoyanov, Swaroop S. Vedula, Kevin Cleary, Gabor Fichtinger, Germain Forestier, Bernard Gibaud, Teodor Grantcharov, Makoto Hashizume, Doreen Heckmann-Nötzel, Hannes G. Kenngott, Ron Kikinis, Lars Mündermann, Nassir Navab, Sinan Onogur, Raphael Sznitman, Russell H. Taylor, Minu D. Tizabi, Martin Wagner, Gregory D. Hager, Thomas Neumuth, Nicolas Padoy, Justin Collins, Ines Gockel, Jan Goedeke, Daniel A. Hashimoto, Luc Joyeux, Kyle Lam, Daniel R. Leff, Amin Madani, Hani J. Marcus, Ozanan Meireles, Alexander Seitel, Dogu Teber, Frank Ückert, Beat P. Müller-Stich, Pierre Jannin, Stefanie Speidel
We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.
Panoptic segmentation has recently unified semantic and instance segmentation, previously addressed separately, thus taking a step further towards creating more comprehensive and efficient perception systems.
We propose a framework that ameliorates this issue by performing scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps.
3D Point clouds are a rich source of information that enjoy growing popularity in the vision community.
This work proposes a RGB-D SLAM system specifically designed for structured environments and aimed at improved tracking and mapping accuracy by relying on geometric features that are extracted from the surrounding.
We change this paradigm and reformulate the problem as an action decision process where an initial pose is updated in incremental discrete steps that sequentially move a virtual 3D rendering towards the correct solution.
In inference, we use this classifier to analyze a second graph, generated from artifact and polyp predictions given by region proposal networks.
In this paper, we propose a method for 3D object completion and classification based on point clouds.
Federated learning (FL) has been a promising approach in the field of medical imaging in recent years.
We propose to adjust the C-arm CBCT source trajectory during the scan to optimize reconstruction quality with respect to a certain task, i. e. verification of screw placement.
In this work, we evaluate FT and LwF for class incremental learning in multi-organ segmentation using the publicly available AAPM dataset.
In this paper a low-drift monocular SLAM method is proposed targeting indoor scenarios, where monocular SLAM often fails due to the lack of textured surfaces.
The core of these strategies is a scoring function ranking the training samples according to either difficulty or uncertainty.
Feature based visual odometry and SLAM methods require accurate and fast correspondence matching between consecutive image frames for precise camera pose estimation in real-time.
In blood cell disorders, only a subset of all cells is morphologically altered and relevant for the diagnosis.
Surgical tool segmentation in endoscopic images is an important problem: it is a crucial step towards full instrument pose estimation and it is used for integration of pre- and intra-operative images into the endoscopic view.
To account for reduced accuracy of the discovered light-weight deep residual network and avoid adding any additional computational burden, we perform a differentiable search over dilation rates for residual units of our network.
Consequently, most MR applications that are centered around the user, such as virtual dressing rooms or learning of body movements, cannot be realized with HMDs.
Brain pathologies can vary greatly in size and shape, ranging from few pixels (i. e. MS lesions) to large, space-occupying tumors.
Experimental results show that the proposed method could provide useful uncertainty information by Bayesian approximation with the efficient ensemble model generation and improve the predictive performance.
As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC).
The cataract, a developing opacity of the human eye lens, constitutes the world's most frequent cause for blindness.
Due to the time-sensitive nature of these cases, doctors are required to propose a correct diagnosis and intervention within a minimal time frame.
In this work we propose a transfer learning framework, core of which is learning an explicit mapping between domains.
We propose a colon deformation estimation method using RNN and obtain the colonoscope shape from electromagnetic sensors during its insertion into the colon.
We utilize the shape estimation network (SEN), which estimates deformed colon shape during colonoscope insertions.
6D object pose estimation is a fundamental problem in computer vision.
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined on the manifold of camera poses.
In our work we focus on scene graphs, a data structure that organizes the entities of a scene in a graph, where objects are nodes and their relationships modeled as edges.
Deep unsupervised representation learning has recently led to new approaches in the field of Unsupervised Anomaly Detection (UAD) in brain MRI.
In our work, we address the novel problem of image manipulation from scene graphs, in which a user can edit images by merely applying changes in the nodes or edges of a semantic graph that is generated from the image.
Our results show that spatio-temporal information in longitudinal data is a beneficial cue for improving segmentation.
Active learning is one of the solutions to this problem where an active learner is designed to indicate which samples need to be annotated to effectively train a target model.
Current deep-learning based methods do not easily integrate to clinical protocols, neither take full advantage of medical knowledge.
Computer-aided diagnosis (CADx) algorithms in medicine provide patient-specific decision support for physicians.
In this paper we introduce the first reinforcement learning (RL) based robotic navigation method which utilizes ultrasound (US) images as an input.
Recently, Graph Convolutional Networks (GCNs) has proven to be a powerful machine learning tool for Computer Aided Diagnosis (CADx) and disease prediction.
Automatic surgical phase recognition is a challenging and crucial task with the potential to improve patient safety and become an integral part of intra-operative decision-support systems.
Medical image segmentation is one of the major challenges addressed by machine learning methods.
We explicitly enforce the meaningful representation to be agnostic to sensitive information by entropy maximization.
Contemporary monocular 6D pose estimation methods can only cope with a handful of object instances.
Stitching images acquired under perspective projective geometry is a relevant topic in computer vision with multiple applications ranging from smartphone panoramas to the construction of digital maps.
We present a novel methodology to detect imperfect bilateral symmetry in CT of human anatomy.
Suboptimal interaction with patient data and challenges in mastering 3D anatomy based on ill-posed 2D interventional images are essential concerns in image-guided therapies.
We trained 30 participants on how to set up a robotic arm in an environment mimicking clinical setup.
Processed force and ultrasound data are fused using a 1D Convolutional Network to compute the location of the vertebral levels.
In many settings, such as those arising in medical and healthcare applications, this assumption is not necessarily true since the graph may be noisy, partially- or even completely unknown, and one is thus interested in inferring it from the data.
We then investigated different strategies, such as a learning without forgetting framework, to leverage artifact knowledge to improve automated polyp detection.
With recent advances of Virtual Reality (VR) technology, the deployment of such will dramatically increase in non-entertainment environments, such as professional education and training, manufacturing, service, or low frequency/high risk scenarios.
Attributing the output of a neural network to the contribution of given input elements is a way of shedding light on the black-box nature of neural networks.
Radar pulse streams exhibit increasingly complex temporal patterns and can no longer rely on a purely value-based analysis of the pulse attributes for the purpose of emitter classification.
Radar signals have been dramatically increasing in complexity, limiting the source separation ability of traditional approaches.
Data-driven computational approaches have evolved to enable extraction of information from medical images with a reliability, accuracy and speed which is already transforming their interpretation and exploitation in clinical practice.
Metal artifacts in computed tomography (CT) arise from a mismatch between physics of image formation and idealized assumptions during tomographic reconstruction.
Deep Learning sets the state-of-the-art in many challenging tasks showing outstanding performance in a broad range of applications.
Semantic segmentation is an import task in the medical field to identify the exact extent and orientation of significant structures like organs and pathology.
We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space.
Our approach aims at building up a Layered Depth Image (LDI) from a single RGB input, which is an efficient representation that arranges the scene in layers, including originally occluded regions.
In this work, we introduce the task of 3D object instance re-localization (RIO): given one or multiple objects in an RGB-D scan, we want to estimate their corresponding 6DoF poses in another 3D scan of the same environment taken at a later point in time.
We present a sampling-free approach for computing the epistemic uncertainty of a neural network.
A solution to mitigate the small training set issue is to pre-train a denoising model with small training sets containing pairs of clean and synthesized noisy signals, produced from empirical noise priors, and fine-tune on the available small training set.
no code implementations • 23 Jul 2019 • Javad Fotouhi, Tianyu Song, Arian Mehrfard, Giacomo Taylor, Qiaochu Wang, Fengfang Xian, Alejandro Martin-Gomez, Bernhard Fuerst, Mehran Armand, Mathias Unberath, Nassir Navab
To overcome this challenge, we introduce a novel registration concept for intuitive alignment of AR content to its physical counterpart by providing a multi-view AR experience via reflective-AR displays that simultaneously show the augmentations from multiple viewpoints.
The lack of large-scale real datasets with annotations makes transfer learning a necessity for video activity understanding.
Chest X-ray radiography is one of the earliest medical imaging technologies and remains one of the most widely-used for diagnosis, screening, and treatment follow up of diseases related to lungs and heart.
Fully Convolutional Neural Networks (F-CNNs) achieve state-of-the-art performance for image segmentation in medical imaging.
In this context, we proposed a segmentation refinement method based on uncertainty analysis and graph convolutional networks.
Recent work has shown Generative Adversarial Networks(GANs) can be used to create realistic images of virtually stained slide images in digital pathology with clinically validated interpretability.
A disadvantage of FL is the dependence on a central server, which requires all clients to agree on one trusted central body, and whose failure would disrupt the training process of all clients.
We propose a new network architecture that exploits an inductive end-to-end learning approach for disease classification, where filters from both the CNN and the graph are trained jointly.
Recently, several works proposed geometric deep learning approaches to solve disease classification, by modeling patients as nodes in a graph, along with graph signal processing of multi-modal features.
To enhance the discriminative power of the classification model, we incorporate triplet embedding loss with a selective sampling routine.
Trained on the dataset alone, we report a Dice and Jaccard coefficient of $0. 45 \pm 0. 09$ and $0. 30 \pm 0. 07$ respectively, as well as an average distance of $0. 78 \pm 0. 36~mm$.
The analysis of the collaborative learning process is one of the growing fields of education research, which has many different analytic solutions.
Learning Interpretable representation in medical applications is becoming essential for adopting data-driven models into clinical practice.
We implement label transfer from MRI to US, which is prone to a residual but inevitable registration error.
The proposed method displays both promising image reconstruction quality and acquisition frequency when integrated for live ultrasound scanning.
In this paper, we propose a novel interpretation method tailored to histological Whole Slide Image (WSI) processing.
Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements.
Despite recent advances on the topic of direct camera pose regression using neural networks, accurately estimating the camera pose of a single RGB image still remains a challenging task.
Geometric deep learning provides a principled and versatile manner for the integration of imaging and non-imaging modalities in the medical domain.
With the increasing computational power of today's workstations, real-time physically-based rendering is within reach, rapidly gaining attention across a variety of domains.
We develop a random walk semi-supervised loss that enables the network to learn representations that are compact and well-separated.
This representation is passed on to the segmenter arm that uses this information to segment the new query image.
We demonstrate the feasibility of a fully automatic computer-aided diagnosis (CAD) tool, based on deep learning, that localizes and classifies proximal femur fractures on X-ray images according to the AO classification.
Compared with traditional augmentation methods, and with images synthesized by Generative Adversarial Networks our method not only achieves state-of-the-art performance but also significantly improves the network's robustness.
Based upon the idea of aligning the quadric gradients with the surface normals, our first formulation is exact and requires as low as four oriented points.
A model capable of leveraging the individuality of each multi-modal data is required for better disease prediction.
For each object instance we predict multiple pose and class outcomes to estimate the specific pose distribution generated by symmetries and repetitive textures.
Next to voxel-wise uncertainty, we introduce four metrics to quantify structure-wise uncertainty in segmentation for quality control.
By providing a novel paradigm for the acquisition and reconstruction of tracked freehand 3D ultrasound, this work presents the concept of Computational Sonography (CS) to model the directionality of ultrasound information.
Further, we reformulate the problem of robotic grasping by replacing conventional grasp rectangles with grasp belief maps, which hold more precise location information than a rectangle and account for the uncertainty inherent to the task.
Histopathological evaluation of tissue samples is a key practice in patient diagnosis and drug development, especially in oncology.
We present a novel, parameter-efficient and practical fully convolutional neural network architecture, termed InfiNet, aimed at voxel-wise semantic segmentation of infant brain MRI images at iso-intense stage, which can be easily extended for other segmentation tasks involving multi-modalities.
In this paper, we target the problem of fracture classification from clinical X-Ray images towards an automated Computer Aided Diagnosis (CAD) system.
Generative Adversarial Networks (GANs) and their extensions have carved open many exciting ways to tackle well known and challenging medical image analysis problems such as medical image de-noising, reconstruction, segmentation, data simulation, detection or classification.
As many other machine learning driven medical image analysis tasks, skin image analysis suffers from a chronic lack of labeled data and skewed class distributions, which poses problems for the training of robust and well-generalizing models.
There is a high demand of 3D data for 360Â° panoramic images and videos, pushed by the growing availability on the market of specialized hardware for both capturing (e. g., omnidirectional cameras) as well as visualizing in 3D (e. g., head mounted displays) panoramic images and videos.
Towards this end, we introduce three variants of SE modules for segmentation, (i) squeezing spatially and exciting channel-wise, (ii) squeezing channel-wise and exciting spatially and (iii) joint spatial and channel 'squeeze & excitation'.
3D geometry is a very informative cue when interacting with and navigating an environment.
This work proposes a general-purpose, fully-convolutional network architecture for efficiently processing large-scale 3D data.
Ranked #16 on Semantic Segmentation on ScanNet
Ophthalmic microsurgery is known to be a challenging operation, which requires very precise and dexterous manipulation.
Nevertheless, we believe that traditional approaches such as L2 distance or Dynamic Time Warping based on hand-crafted local pose metrics fail to appropriately capture the semantic relationship across motions and, as such, are not suitable for being employed as metrics within these tasks.
While conventional depth estimation can infer the geometry of a scene from a single RGB image, it fails to estimate scene regions that are occluded by foreground objects.
Increased information sharing through short and long-range skip connections between layers in fully convolutional networks have demonstrated significant improvement in performance for semantic segmentation.
Cross modal image syntheses is gaining significant interests for its ability to estimate target images of a different modality from a given set of source images, like estimating MR to MR, MR to CT, CT to PET etc, without the need for an actual acquisition. Though they show potential for applications in radiation therapy planning, image super resolution, atlas construction, image segmentation etc. The synthesis results are not as accurate as the actual acquisition. In this paper, we address the problem of multi modal image synthesis by proposing a fully convolutional deep learning architecture called the SynNet. We extend the proposed architecture for various input output configurations.
For C-arm repositioning to a particular target view, the recorded C-arm pose is restored as a virtual object and visualized in an AR environment, serving as a perceptual reference for the technician.
Registration of partial-view 3D US volumes with MRI data is influenced by initialization.
An estimation method of colon deformations occur during colonoscope insertions is necessary to reduce tracking errors.
Recent neural-network-based architectures for image segmentation make extensive usage of feature forwarding mechanisms to integrate information from multiple scales.
Scene coordinate regression has become an essential part of current camera re-localization methods.
One of the greatest challenges towards fully autonomous cars is the understanding of complex and dynamic scenes.
A key challenge in cancer immunotherapy biomarker research is quantification of pattern changes in microscopic whole slide images of tumor biopsies.
By combining the strengths of manifold learning using triplet loss and pose regression, we could either estimate the pose directly reducing the complexity compared to NN search, or use learned descriptor for the NN descriptor matching.
Structural data from Electronic Health Records as complementary information to imaging data for disease prediction.
Segmentation of the left atrium and deriving its size can help to predict and detect various cardiovascular conditions.
We introduce inherent measures for effective quality control of brain segmentation based on a Bayesian fully convolutional neural network, using model uncertainty.
Generative Adversarial Networks (GANs) have been successfully used to synthesize realistically looking images of faces, scenery and even medical images.
Reliably modeling normality and differentiating abnormal appearances from normal cases is a very appealing approach for detecting pathologies in medical images.
The main challenge is to automatically estimate the desired plane of symmetry within the patient's pre-operative CT. We propose to estimate this plane using a non-linear optimization strategy, by minimizing Tukey's biweight robust estimator, relying on the partial symmetry of the anatomy.
Tracking of rotation and translation of medical instruments plays a substantial role in many modern interventions.
Within medical imaging, manual curation of sufficient well-labeled samples is cost, time and scale-prohibitive.
Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users.
In this work, we follow up on the idea of modeling multi-modal disease classification as a matrix completion problem, with simultaneous classification and non-linear imputation of features.
In this paper, for the first time, we propose an evaluation method for deep learning models that assesses the performance of a model not only in an unseen test scenario, but also in extreme cases of noise, outliers and ambiguous input data.
Machine learning-based approaches outperform competing methods in most disciplines relevant to diagnostic radiology.
In percutaneous orthopedic interventions the surgeon attempts to reduce and fixate fractures in bony structures.