Robotic ultrasound (US) imaging aims at overcoming some of the limitations of free-hand US examinations, e. g. difficulty in guaranteeing intra- and inter-operator repeatability.
To enable this, we make use of realistic ultrasound simulation techniques that allow for instantiation of several independent speckle realizations that represent the exact same tissue, thus allowing for the application of image reconstruction techniques that work with pairs of differently corrupted data.
In this paper, we introduce DA$^2$, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods while requiring significantly fewer computations.
Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories.
Various morphological and functional parameters of peripheral nerves and their vascular supply are indicative of pathological changes due to injury or disease.
By considering the consistency information with the diversity in the consistency-based embedding scheme, the proposed method could select more informative samples for labeling in the semi-supervised learning setting.
We find that our proposed pre-training methods help in modeling the data at a patient and population level and improve performance in different fine-tuning tasks on all datasets.
Abdominal aortic aneurysm (AAA) is a vascular disease in which a section of the aorta enlarges, weakening its walls and potentially rupturing the vessel.
Deep learning models used in medical image analysis are prone to raising reliability concerns due to their black-box nature.
Inpainting has recently been proposed as a successful deep learning technique for unsupervised medical image model discovery.
Although most medical centers conduct similar medical imaging tasks, their differences, such as specializations, number of patients, and devices, lead to distinctive data distributions.
Here, we propose a cross-domain adapted autoencoder to extract features in an unsupervised manner on three different datasets of single white blood cells scanned from peripheral blood smears.
Light-sheet fluorescence microscopy (LSFM) is a cutting-edge volumetric imaging technique that allows for three-dimensional imaging of mesoscopic samples with decoupled illumination and detection paths.
To this end, we propose a multi-task method based on U-Net that takes T1-weighted MR images as an input to generate synthetic FDG-PET images and classifies the dementia progression of the patient into cognitive normal (CN), cognitive impairment (MCI), and AD.
We propose a novel method to automatically calibrate tracked ultrasound probes.
We validate TriMix on eight benchmark datasets consisting of natural and medical images with an improvement of 2. 71% and 0. 41% better than the second-best models for both data types.
Despite its broad availability, volumetric information acquisition from Bright-Field Microscopy (BFM) is inherently difficult due to the projective nature of the acquisition process.
Object pose estimation is crucial for robotic applications and augmented reality.
no code implementations • 16 May 2022 • Bailiang Jian, Mohammad Farid Azampour, Francesca De Benetti, Johannes Oberreuter, Christina Bukas, Alexandra S. Gersing, Sarah C. Foreman, Anna-Sophia Dietrich, Jon Rischewski, Jan S. Kirschke, Nassir Navab, Thomas Wendler
We specifically design these losses to depend only on the CT label maps since automatic vertebra segmentation in CT gives more accurate results contrary to MRI.
The results demonstrate that proposed approach can effectively and accurately navigate the probe towards the longitudinal view of vessels.
With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process.
Automated segmentation of retinal optical coherence tomography (OCT) images has become an important recent direction in machine learning for medical applications.
Ranked #1 on Retinal OCT Layer Segmentation on Duke SD-OCT
One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced.
In this work, we propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications that exploits the graph representation of the input data samples and their latent relation.
1 code implementation • 30 Mar 2022 • Paul Engstler, Matthias Keicher, David Schinz, Kristina Mach, Alexandra S. Gersing, Sarah C. Foreman, Sophia S. Goller, Juergen Weissinger, Jon Rischewski, Anna-Sophia Dietrich, Benedikt Wiestler, Jan S. Kirschke, Ashkan Khakzar, Nassir Navab
Do black-box neural network models learn clinically relevant features for fracture diagnosis?
Chest radiograph reporting is time-consuming, and numerous solutions to automate this process have been proposed.
We show that training the agent against the prediction model can significantly improve the semantic features extracted for downstream classification tasks.
We test our method on two medical datasets of patient records, TADPOLE and MIMIC-III, including imaging and non-imaging features and different prediction tasks.
Ranked #1 on Length-of-Stay prediction on MIMIC-III
In this work, we propose a novel data augmentation method for clinical audio datasets based on a conditional Wasserstein Generative Adversarial Network with Gradient Penalty (cWGAN-GP), operating on log-mel spectrograms.
Towards this goal, for the first time, we propose using semantic scene graphs (SSG) to describe and summarize the surgical scene.
Ranked #1 on Scene Graph Generation on 4D-OR
For this purpose, longitudinal self-supervision schemes are explored on clinical longitudinal COVID-19 CT scans.
Dense methods also improved pose estimation in the presence of occlusion.
Existing datasets from OR room cameras are thus far limited in size or modalities acquired, leaving it unclear which sensor modalities are best suited for tasks such as recognizing surgical action from videos.
While 6D object pose estimation has recently made a huge leap forward, most methods can still only handle a single or a handful of different objects, which limits their applications.
Ranked #1 on 6D Pose Estimation on LineMOD (Mean ADD-S metric)
There have been numerous recently proposed methods for monocular depth prediction (MDP) coupled with the equally rapid evolution of benchmarking tools.
The video action segmentation task is regularly explored under weaker forms of supervision, such as transcript supervision, where a list of actions is easier to obtain than dense frame-wise labels.
We propose a novel variational Bayesian formulation for diffeomorphic non-rigid registration of medical images, which learns in an unsupervised way a data-specific similarity metric.
Despite training only on a standard dataset, such as KITTI, augmenting with our vector fields significantly improves the generalization to differently shaped objects and scenes.
Indirect Time-of-Flight (I-ToF) imaging is a widespread way of depth estimation for mobile devices due to its small size and affordable price.
We first present a small sequence of RGB-D images displaying a human-object interaction.
With the advent of deep learning, estimating depth from a single RGB image has recently received a lot of attention, being capable of empowering many different applications ranging from path planning for robotics to computational cinematography.
For this purpose, we present a platform for autonomous trocar docking that combines computer vision and a robotic setup.
We propose MIGS (Meta Image Generation from Scene Graphs), a meta-learning based approach for few-shot image generation from graphs that enables adapting the model to different scenes and increases the image quality by training on diverse sets of tasks.
A novel temporal attention mechanism further processes the local geometric information in a global context across consecutive images.
Accurate and reliable localization is a fundamental requirement for autonomous vehicles to use map information in higher-level tasks such as navigation or planning.
We propose a method to identify features with predictive information in the input domain.
Estimating the uncertainty of a neural network plays a fundamental role in safety-critical settings.
Existing automatic and interactive segmentation models for medical images only use data from a single time point (static).
The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin and can be employed in 3D reconstruction pipelines.
In this work, we present MetaMedSeg, a gradient-based meta-learning algorithm that redefines the meta-learning task for the volumetric medical data with the goal to capture the variety between the slices.
Sickle cell disease (SCD) is a severe genetic hemoglobin disorder that results in premature destruction of red blood cells.
Scene graphs are representations of a scene, composed of objects (nodes) and inter-object relationships (edges), proven to be particularly suited for this task, as they allow for semantic control on the generated content.
Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (e. g. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem.
Ranked #1 on 6D Pose Estimation using RGB on Occlusion LineMOD
Scene graphs, composed of nodes as objects and directed-edges as relationships among objects, offer an alternative representation of a scene that is more semantically grounded than images.
no code implementations • 10 Aug 2021 • Markus Krönke, Christine Eilers, Desislava Dimova, Melanie Köhler, Gabriel Buschner, Lilit Mirzojan, Lemonia Konstantinidou, Marcus R. Makowski, James Nagarajah, Nassir Navab, Wolfgang Weber, Thomas Wendler
Conclusion: Tracked 3D ultrasound combined with a CNN segmentation significantly reduces interobserver variability in thyroid volumetry and increases the accuracy of the measurements with shorter acquisition times.
While self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches, violations of the static world assumption can still lead to erroneous depth predictions of traffic participants, posing a potential safety issue.
no code implementations • 29 Jul 2021 • Matthias Keicher, Hendrik Burwinkel, David Bani-Harouni, Magdalini Paschali, Tobias Czempiel, Egon Burian, Marcus R. Makowski, Rickmer Braren, Nassir Navab, Thomas Wendler
Specifically, we introduce a multimodal similarity metric to build a population graph for clustering patients and an image-based end-to-end Graph Attention Network to process this graph and predict the COVID-19 patient outcomes: admission to ICU, need for ventilation and mortality.
Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets.
In this work, we introduce Deep Direct Volume Rendering (DeepDVR), a generalization of DVR that allows for the integration of deep neural networks into the DVR algorithm.
We then use MSSG to introduce a dynamically generated graphical user interface tool for surgical procedure analysis which could be used for many applications including process optimization, OR design and automatic report generation.
Medical Ultrasound (US), despite its wide use, is characterized by artifacts and operator dependency.
The soft pseudo-labels are then used to train a deep student network for disease prediction of unseen test data for which the graph modality is unavailable.
We present our findings using publicly available chest pathologies (CheXpert, NIH ChestX-ray8) and COVID-19 datasets (BrixIA, and COVID-19 chest X-ray segmentation dataset).
Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays.
Is critical input information encoded in specific sparse pathways within the neural network?
The main novelty lies in the interpretable attention module (IAM), which directly operates on multi-modal features.
Scene graphs are a compact and explicit representation successfully used in a variety of 2D scene understanding tasks.
Ranked #1 on 3D Object Classification on 3R-Scan
Disentangled representations can be useful in many downstream tasks, help to make deep learning models more interpretable, and allow for control over features of synthetically generated images that can be useful in training other models that require a large number of labelled or unlabelled data.
Hereditary hemolytic anemias are genetic disorders that affect the shape and density of red blood cells.
The method leverages the availability of labelled data in a different domain.
1 code implementation • 12 Mar 2021 • Christina Bukas, Bailiang Jian, Luis F. Rodriguez Venegas, Francesca De Benetti, Sebastian Ruehling, Anjany Sekuboyina, Jens Gempt, Jan S. Kirschke, Marie Piraud, Johannes Oberreuter, Nassir Navab, Thomas Wendler
The framework uses the patient CT scan and the fractured vertebra label to build a virtual healthy spine using a high-level approach.
Chest computed tomography (CT) has played an essential diagnostic role in assessing patients with COVID-19 by showing disease-specific image features such as ground-glass opacity and consolidation.
In this paper we introduce OperA, a transformer-based model that accurately predicts surgical phases from long video sequences.
With few annotated data, FedPerl is on par with a state-of-the-art method in skin lesion classification in the standard setup while outperforming SSFLs and the baselines by 1. 8% and 15. 8%, respectively.
This is accomplished by associating a graph-based neural network to each class, which is responsible for weighting the class samples and changing the importance of each sample for the classifier.
For the former we contributed our own dataset composed of five indoor scenes where it is unavoidable to capture images corresponding to views that are hard to uniquely identify.
In this context, we proposed a segmentation refinement method based on uncertainty analysis and graph convolutional networks.
In this work, we empirically show that two approaches for handling the gradient information, namely positive aggregation, and positive propagation, break these methods.
First, a neural network is trained once to detect a set of anatomical landmarks on simulated X-rays.
To address these issues, recently, unsupervised deep anomaly detection methods that train the model on large-sized normal scans and detect abnormal scans by calculating reconstruction error have been reported.
no code implementations • 30 Oct 2020 • Lena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya, Keno März, Toby Collins, Anand Malpani, Johannes Fallert, Hubertus Feussner, Stamatia Giannarou, Pietro Mascagni, Hirenkumar Nakawala, Adrian Park, Carla Pugh, Danail Stoyanov, Swaroop S. Vedula, Kevin Cleary, Gabor Fichtinger, Germain Forestier, Bernard Gibaud, Teodor Grantcharov, Makoto Hashizume, Doreen Heckmann-Nötzel, Hannes G. Kenngott, Ron Kikinis, Lars Mündermann, Nassir Navab, Sinan Onogur, Raphael Sznitman, Russell H. Taylor, Minu D. Tizabi, Martin Wagner, Gregory D. Hager, Thomas Neumuth, Nicolas Padoy, Justin Collins, Ines Gockel, Jan Goedeke, Daniel A. Hashimoto, Luc Joyeux, Kyle Lam, Daniel R. Leff, Amin Madani, Hani J. Marcus, Ozanan Meireles, Alexander Seitel, Dogu Teber, Frank Ückert, Beat P. Müller-Stich, Pierre Jannin, Stefanie Speidel
We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.
Panoptic segmentation has recently unified semantic and instance segmentation, previously addressed separately, thus taking a step further towards creating more comprehensive and efficient perception systems.
We propose a framework that ameliorates this issue by performing scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps.
3D Point clouds are a rich source of information that enjoy growing popularity in the vision community.
This work proposes a RGB-D SLAM system specifically designed for structured environments and aimed at improved tracking and mapping accuracy by relying on geometric features that are extracted from the surrounding.
We change this paradigm and reformulate the problem as an action decision process where an initial pose is updated in incremental discrete steps that sequentially move a virtual 3D rendering towards the correct solution.
In inference, we use this classifier to analyze a second graph, generated from artifact and polyp predictions given by region proposal networks.
In this paper, we propose a method for 3D object completion and classification based on point clouds.
Federated learning (FL) has been a promising approach in the field of medical imaging in recent years.
We propose to adjust the C-arm CBCT source trajectory during the scan to optimize reconstruction quality with respect to a certain task, i. e. verification of screw placement.
In this work, we evaluate FT and LwF for class incremental learning in multi-organ segmentation using the publicly available AAPM dataset.
In this paper a low-drift monocular SLAM method is proposed targeting indoor scenarios, where monocular SLAM often fails due to the lack of textured surfaces.
In this paper, we propose a method for the automatic classification of proximal femur fractures into 3 and 7 AO classes based on a Convolutional Neural Network (CNN).
Feature based visual odometry and SLAM methods require accurate and fast correspondence matching between consecutive image frames for precise camera pose estimation in real-time.
In blood cell disorders, only a subset of all cells is morphologically altered and relevant for the diagnosis.
Surgical tool segmentation in endoscopic images is an important problem: it is a crucial step towards full instrument pose estimation and it is used for integration of pre- and intra-operative images into the endoscopic view.
To account for reduced accuracy of the discovered light-weight deep residual network and avoid adding any additional computational burden, we perform a differentiable search over dilation rates for residual units of our network.
Consequently, most MR applications that are centered around the user, such as virtual dressing rooms or learning of body movements, cannot be realized with HMDs.
Brain pathologies can vary greatly in size and shape, ranging from few pixels (i. e. MS lesions) to large, space-occupying tumors.
Experimental results show that the proposed method could provide useful uncertainty information by Bayesian approximation with the efficient ensemble model generation and improve the predictive performance.
As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC).
The cataract, a developing opacity of the human eye lens, constitutes the world's most frequent cause for blindness.
Due to the time-sensitive nature of these cases, doctors are required to propose a correct diagnosis and intervention within a minimal time frame.
In this work we propose a transfer learning framework, core of which is learning an explicit mapping between domains.
We propose a colon deformation estimation method using RNN and obtain the colonoscope shape from electromagnetic sensors during its insertion into the colon.
We utilize the shape estimation network (SEN), which estimates deformed colon shape during colonoscope insertions.
6D object pose estimation is a fundamental problem in computer vision.
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined on the manifold of camera poses.
In our work we focus on scene graphs, a data structure that organizes the entities of a scene in a graph, where objects are nodes and their relationships modeled as edges.
Deep unsupervised representation learning has recently led to new approaches in the field of Unsupervised Anomaly Detection (UAD) in brain MRI.
Our results show that spatio-temporal information in longitudinal data is a beneficial cue for improving segmentation.
In our work, we address the novel problem of image manipulation from scene graphs, in which a user can edit images by merely applying changes in the nodes or edges of a semantic graph that is generated from the image.
Active learning is one of the solutions to this problem where an active learner is designed to indicate which samples need to be annotated to effectively train a target model.
Current deep-learning based methods do not easily integrate to clinical protocols, neither take full advantage of medical knowledge.
Computer-aided diagnosis (CADx) algorithms in medicine provide patient-specific decision support for physicians.
In this paper we introduce the first reinforcement learning (RL) based robotic navigation method which utilizes ultrasound (US) images as an input.
Recently, Graph Convolutional Networks (GCNs) have proven to be a powerful machine learning tool for Computer-Aided Diagnosis (CADx) and disease prediction.
Automatic surgical phase recognition is a challenging and crucial task with the potential to improve patient safety and become an integral part of intra-operative decision-support systems.
Medical image segmentation is one of the major challenges addressed by machine learning methods.
Contemporary monocular 6D pose estimation methods can only cope with a handful of object instances.
We explicitly enforce the meaningful representation to be agnostic to sensitive information by entropy maximization.
Stitching images acquired under perspective projective geometry is a relevant topic in computer vision with multiple applications ranging from smartphone panoramas to the construction of digital maps.
Suboptimal interaction with patient data and challenges in mastering 3D anatomy based on ill-posed 2D interventional images are essential concerns in image-guided therapies.
We present a novel methodology to detect imperfect bilateral symmetry in CT of human anatomy.
We trained 30 participants on how to set up a robotic arm in an environment mimicking clinical setup.
Processed force and ultrasound data are fused using a 1D Convolutional Network to compute the location of the vertebral levels.
We provide an extensive evaluation of applications from the domains of healthcare (disease prediction), brain imaging (age prediction), computer graphics (3D point cloud segmentation), and computer vision (zero-shot learning).
We then investigated different strategies, such as a learning without forgetting framework, to leverage artifact knowledge to improve automated polyp detection.
With recent advances of Virtual Reality (VR) technology, the deployment of such will dramatically increase in non-entertainment environments, such as professional education and training, manufacturing, service, or low frequency/high risk scenarios.
Attributing the output of a neural network to the contribution of given input elements is a way of shedding light on the black-box nature of neural networks.
Radar pulse streams exhibit increasingly complex temporal patterns and can no longer rely on a purely value-based analysis of the pulse attributes for the purpose of emitter classification.
Radar signals have been dramatically increasing in complexity, limiting the source separation ability of traditional approaches.
Data-driven computational approaches have evolved to enable extraction of information from medical images with a reliability, accuracy and speed which is already transforming their interpretation and exploitation in clinical practice.
Metal artifacts in computed tomography (CT) arise from a mismatch between physics of image formation and idealized assumptions during tomographic reconstruction.
Deep Learning sets the state-of-the-art in many challenging tasks showing outstanding performance in a broad range of applications.
Semantic segmentation is an import task in the medical field to identify the exact extent and orientation of significant structures like organs and pathology.
We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space.
Ranked #7 on 3D Semantic Scene Completion on NYUv2 (using extra training data)
Our approach aims at building up a Layered Depth Image (LDI) from a single RGB input, which is an efficient representation that arranges the scene in layers, including originally occluded regions.
In this work, we introduce the task of 3D object instance re-localization (RIO): given one or multiple objects in an RGB-D scan, we want to estimate their corresponding 6DoF poses in another 3D scan of the same environment taken at a later point in time.
We present a sampling-free approach for computing the epistemic uncertainty of a neural network.
A solution to mitigate the small training set issue is to pre-train a denoising model with small training sets containing pairs of clean and synthesized noisy signals, produced from empirical noise priors, and fine-tune on the available small training set.
no code implementations • 23 Jul 2019 • Javad Fotouhi, Tianyu Song, Arian Mehrfard, Giacomo Taylor, Qiaochu Wang, Fengfang Xian, Alejandro Martin-Gomez, Bernhard Fuerst, Mehran Armand, Mathias Unberath, Nassir Navab
To overcome this challenge, we introduce a novel registration concept for intuitive alignment of AR content to its physical counterpart by providing a multi-view AR experience via reflective-AR displays that simultaneously show the augmentations from multiple viewpoints.
The lack of large-scale real datasets with annotations makes transfer learning a necessity for video activity understanding.
Chest X-ray radiography is one of the earliest medical imaging technologies and remains one of the most widely-used for diagnosis, screening, and treatment follow up of diseases related to lungs and heart.
Fully Convolutional Neural Networks (F-CNNs) achieve state-of-the-art performance for image segmentation in medical imaging.
In this context, we proposed a segmentation refinement method based on uncertainty analysis and graph convolutional networks.
Recent work has shown Generative Adversarial Networks(GANs) can be used to create realistic images of virtually stained slide images in digital pathology with clinically validated interpretability.
A disadvantage of FL is the dependence on a central server, which requires all clients to agree on one trusted central body, and whose failure would disrupt the training process of all clients.
We propose a new network architecture that exploits an inductive end-to-end learning approach for disease classification, where filters from both the CNN and the graph are trained jointly.
Recently, several works proposed geometric deep learning approaches to solve disease classification, by modeling patients as nodes in a graph, along with graph signal processing of multi-modal features.
Trained on the dataset alone, we report a Dice and Jaccard coefficient of $0. 45 \pm 0. 09$ and $0. 30 \pm 0. 07$ respectively, as well as an average distance of $0. 78 \pm 0. 36~mm$.
To enhance the discriminative power of the classification model, we incorporate triplet embedding loss with a selective sampling routine.
The analysis of the collaborative learning process is one of the growing fields of education research, which has many different analytic solutions.
Learning Interpretable representation in medical applications is becoming essential for adopting data-driven models into clinical practice.
We implement label transfer from MRI to US, which is prone to a residual but inevitable registration error.
The proposed method displays both promising image reconstruction quality and acquisition frequency when integrated for live ultrasound scanning.
In this paper, we propose a novel interpretation method tailored to histological Whole Slide Image (WSI) processing.
Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements.
Despite recent advances on the topic of direct camera pose regression using neural networks, accurately estimating the camera pose of a single RGB image still remains a challenging task.