Notably, this work may be the first attempt to combine CNN and transformer for semi-supervised medical image segmentation and achieve promising results on a public benchmark.
Deep neural networks usually require accurate and a large number of annotations to achieve outstanding performance in medical image segmentation.
Although many efforts in this task, there are still few large image datasets covering the whole abdomen region with accurate and detailed annotations for the whole abdominal organ segmentation.
Computed Tomography (CT) plays an important role in monitoring radiation-induced Pulmonary Fibrosis (PF), where accurate segmentation of the PF lesions is highly desired for diagnosis and treatment follow-up.
First, we present a domain composition method that represents one certain domain by a linear combination of a set of basis representations (i. e., a representation bank).
For example, STFT improves the still image baseline FCOS by 10. 6% and 20. 6% on the comprehensive F1-score of the polyp localization task in CVC-Clinic and ASUMayo datasets, respectively, and outperforms the state-of-the-art video-based method by 3. 6% and 8. 0%, respectively.
To handle this problem, we propose a hybrid supervision learning framework for this kind of high resolution images with sufficient image-level coarse annotations and a few pixel-level fine labels.
The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.
In this paper, we aim to boost the performance of semi-supervised learning for medical image segmentation with limited labels using a self-ensembling contrastive learning technique.
The study of multi-type Protein-Protein Interaction (PPI) is fundamental for understanding biological processes from a systematic perspective and revealing disease mechanisms.
To solve these problems, we propose a novel deep learning-based interactive segmentation method that not only has high efficiency due to only requiring clicks as user inputs but also generalizes well to a range of previously unseen objects.
To overcome these problems, we propose a 3D sphere representation-based center-points matching detection network that is anchor-free and automatically predicts the position, radius, and offset of nodules without the manual design of nodule/anchor parameters.
Despite the stateof-the-art performance achieved by Convolutional Neural Networks (CNNs) for automatic segmentation of OARs, existing methods do not provide uncertainty estimation of the segmentation results for treatment planning, and their accuracy is still limited by several factors, including the low contrast of soft tissues in CT, highly imbalanced sizes of OARs and large inter-slice spacing.
In this paper, we propose an annotation-efficient learning framework for segmentation tasks that avoids annotations of training images, where we use an improved Cycle-Consistent Generative Adversarial Network (GAN) to learn from a set of unpaired medical images and auxiliary masks obtained either from a shape model or public datasets.
As deep learning technologies advance, increasingly more data is necessary to generate general and robust models for various tasks.
In this paper, we propose a novel framework with Uncertainty Rectified Pyramid Consistency (URPC) regularization for semi-supervised NPC GTV segmentation.
To address this problem, we present a one-shot framework for organ and landmark localization in volumetric medical images, which does not need any annotation during the training stage and could be employed to locate any landmarks or organs in test images given a support (reference) image during the inference stage.
Inspired by Euler's Elastica model and recent active contour models introduced into the field of deep learning, we propose a novel active contour with elastica (ACE) loss function incorporating Elastica (curvature and length) and region information as geometrically-natural constraints for the image segmentation tasks.
Also, we propose a scale attention module implicitly emphasizing the most salient feature maps among multiple scales so that the CNN is adaptive to the size of an object.
Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network.
In most scenarios, one might obtain annotations of a single or a few organs from one training set, and obtain annotations of the the other organs from another set of training images.
To alleviate such tedious and manual effort, in this paper we propose a novel weakly supervised segmentation framework based on partial points annotation, i. e., only a small portion of nuclei locations in each image are labeled.
Experimental results showed that our framework achieved the top performance on ISLES 2018 challenge and: 1) our method using synthesized pseudo DWI outperformed methods segmenting the lesion from perfusion parameter maps directly; 2) the feature extractor exploiting additional spatiotemporal CTA images led to better synthesized pseudo DWI quality and higher segmentation accuracy; and 3) the proposed loss functions and network structure improved the pseudo DWI synthesis and lesion segmentation performance.
Experimental results show that: (1) our proposed CNN obtains uncertainty estimation in real time which correlates well with mis-segmentations, (2) the proposed interactive level set is effective and efficient for refinement, (3) UGIR obtains accurate refinement results with around 30% improvement of efficiency by using uncertainty to guide user interactions.
The segmentation of coronary arteries in X-ray angiograms by convolutional neural networks (CNNs) is promising yet limited by the requirement of precisely annotating all pixels in a large number of training images, which is extremely labor-intensive especially for complex coronary trees.
To this end, we have developed SenseCare research platform for smart healthcare, which is designed to boost translational research on intelligent diagnosis and treatment planning in various clinical scenarios.
Human-Computer Interaction Image and Video Processing
Our proposed CFEA is an interactive paradigm which presents an exquisite of collaborative adaptation through both adversarial learning and ensembling weights.
However, effective and efficient delineation of all the knee articular cartilages in large-sized and high-resolution 3D MR knee data is still an open challenge.
This study presents a starting point toward a powerful tool for automatic classification of gait disorders and can be used as a basis for future applications of Deep Learning in clinical gait analysis.
In order to address these limitations, we present tree-structured ConvLSTM models for tree-structured image analysis tasks which can be trained end-to-end.
The hierarchical attention components of the residual attention subnet force our network to focus on the key components of the X-ray images and generate the final predictions as well as the associated visual supports, which is similar to the assessment procedure of clinicians.
Finally, to reduce the memory consumption and high precision operations both in training and testing, we further quantize weights, inputs, and gradients of our localization network to low bit-width numbers.
Ranked #15 on Pose Estimation on MPII Human Pose
Generating multi-view images from a single-view input is an essential yet challenging problem.
In this paper, we introduce an interactive training method to improve the natural language conversation system for a visual grounding task.
In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.
Ranked #4 on Text-to-Image Generation on Oxford 102 Flowers
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.
Ranked #2 on Text-to-Image Generation on Oxford 102 Flowers (Inception score metric)
Multispectral pedestrian detection is essential for around-the-clock applications, e. g., surveillance and autonomous driving.
In this paper, we propose a new CNN architecture that integrates semantic part detection and abstraction (SPDA-CNN) for fine-grained classification.
In this paper, we propose a novel visual tracking framework that intelligently discovers reliable patterns from a wide range of video to resist drift error for long-term tracking tasks.
However, previous studies have rarely focused on learning a fined-grained and structured feature representation that is able to locate similar images at different levels of relevance, e. g., discovering cars from the same make or the same model, both of which require high precision.
Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting.
Face alignment, especially on real-time or large-scale sequential images, is a challenging task with broad applications.