This approach enables us to optimally utilize the knowledge and reasoning capacities of large pre-trained language models for an array of tasks encompassing both language and vision.
1 code implementation • 20 Nov 2023 • Jin Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao
Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes.
We first compose a multi-task training dataset comprising 13. 4 million instruction and ground-truth pairs (with approximately one million radiographs) for the customized tuning, involving both image- and pixel-level tasks.
1 code implementation • 23 Oct 2023 • Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao
These issues can hardly be addressed by fine-tuning SAM on medical data because the original 2D structure of SAM neglects 3D spatial information.
Image synthesis approaches, e. g., generative adversarial networks, have been popular as a form of data augmentation in medical image analysis tasks.
The different predictions in these duplicated heads are used to obtain pseudo labels for unlabeled target-domain images and their uncertainty to identify reliable pseudo labels.
Specifically, we employ a Triple-branch multi-Dilated network (TDNet) with one encoder and three decoders using different dilation rates to capture features from different receptive fields that are complementary to each other to generate high-quality soft pseudo labels.
2 code implementations • 7 Sep 2023 • Ziyan Huang, Zhongying Deng, Jin Ye, Haoyu Wang, Yanzhou Su, Tianbin Li, Hui Sun, Junlong Cheng, Jianpin Chen, Junjun He, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao
To address these questions, we introduce A-Eval, a benchmark for the cross-dataset Evaluation ('Eval') of Abdominal ('A') multi-organ segmentation.
3 code implementations • 30 Aug 2023 • Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, Yu Qiao
To bridge this gap, we introduce SAM-Med2D, the most comprehensive studies on applying SAM to medical 2D images.
Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios.
The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements.
In this paper, we propose to Connect Image and Text Embeddings (CITE) to enhance pathological image classification.
Survival outcome assessment is challenging and inherently associated with multiple clinical factors (e. g., imaging and genomics biomarkers) in cancer.
Foundation models have exhibited remarkable success in various applications, such as disease diagnosis and text report generation.
The proposed model was pretrained with 110k unannotated 3D CT volumes, and experiments with different downstream segmentation targets including head and neck organs, thoracic/abdominal organs showed that our pretrained model largely outperformed training from scratch and several state-of-the-art self-supervised training methods and segmentation models.
Our findings are twofold: 1) MedLAM is capable of directly localizing any anatomical structure using just a few template scans, yet its performance surpasses that of fully supervised models; 2) MedLSAM not only aligns closely with the performance of SAM and its specialized medical adaptations with manual prompts but achieves this with minimal reliance on extreme point annotations across the entire dataset.
Deep learning models often require large amounts of data for training, leading to increased costs.
Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing.
To this end, we propose a novel weakly-supervised method with image-level labels based on semantic features and context information exploration.
1 code implementation • 16 Jun 2023 • Dequan Wang, Xiaosong Wang, Lilong Wang, Mengzhang Li, Qian Da, Xiaoqiang Liu, Xiangyu Gao, Jun Shen, Junjun He, Tian Shen, Qi Duan, Jie Zhao, Kang Li, Yu Qiao, Shaoting Zhang
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications.
This article discusses the opportunities, applications and future directions of large-scale pre-trained models, i. e., foundation models, for analyzing medical images.
To mitigate this challenge, we propose a novel task and create a human-to-human mixed-type medical consultation dialogue corpus, termed MidMed, covering five dialogue types: task-oriented dialogue for diagnosis, recommendation, knowledge-grounded dialogue, QA, and chitchat.
Inspired by the training of medical residents, we explore universal medical image segmentation, whose goal is to learn from diverse medical imaging sources covering a range of clinical targets, body regions, and image modalities.
To achieve this, we propose a Concept Embedding Search (ConES) approach by optimizing prompt embeddings -- without the need of the text encoder -- to capture the 'concept' of the image modality through a variety of task objectives.
In this paper, we propose a novel SSL method based on Cross Distillation of Multiple Attentions (CDMA) to effectively leverage unlabeled images.
In this work, we proposed an end-to-end deep learning framework, which could predict the coronary artery hemodynamics from CCTA images.
The responses generated by chatbots based on LLMs are recorded for blind evaluations by five licensed medical experts.
3) A dataset named CCA-200 is collected, consisting of 200 CCTA images with coronary artery disease.
As the first promptable foundation model for segmentation tasks, it was trained on a large dataset with an unprecedented number of images and annotations.
However, the state-of-the-art models for medical image segmentation are still small-scale, with their parameters only in the tens of millions.
Therefore, in this work, we further explore the possibility of leveraging pre-trained VLMs as medical foundation models for building general-purpose medical AI, where we thoroughly investigate three machine-learning paradigms, i. e., domain/task-specialized learning, joint learning, and continual learning, for training the VLMs and evaluate their generalization performance on cross-domain and cross-task test sets.
First, a disentangle network is proposed to decompose an image into a domain-invariant anatomical representation and a domain-specific style code, where the former is sent to a segmentation model that is not affected by the domain shift, and the disentangle network is regularized by a decoder that combines the anatomical and style codes to reconstruct the input image.
Existing toolkits mainly focus on fully supervised segmentation and require full and accurate pixel-level annotations that are time-consuming and difficult to acquire for segmentation tasks, which makes learning from imperfect labels highly desired for reducing the annotation cost.
To solve this problem, we propose Contrastive Semi-supervised learning for Cross Anatomy Domain Adaptation (CS-CADA) that adapts a model to segment similar structures in a target domain, which requires only limited annotations in the target domain by leveraging a set of existing annotated images of similar structures in a source domain.
The success of Convolutional Neural Networks (CNNs) in 3D medical image segmentation relies on massive fully annotated 3D volumes for training that are time-consuming and labor-intensive to acquire.
In this work, we propose the self-supervised and weight-preserving neural architecture search (SSWP-NAS) as an extension of the current NAS framework by allowing the self-supervision and retaining the concomitant weights discovered during the search stage.
Accurate segmentation of Anatomical brain Barriers to Cancer spread (ABCs) plays an important role for automatic delineation of Clinical Target Volume (CTV) of brain tumors in radiotherapy.
To tackle this deficiency, we propose Contrastive Domain Disentangle (CDD) network for generalizable medical image segmentation.
Medical image segmentation plays an irreplaceable role in computer-assisted diagnosis, treatment planning, and following-up.
Transformers have demonstrated remarkable performance in natural language processing and computer vision.
Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales.
Medical image segmentation has been widely recognized as a pivot procedure for clinical diagnosis, analysis, and treatment planning.
Notably, this work may be the first attempt to combine CNN and transformer for semi-supervised medical image segmentation and achieve promising results on a public benchmark.
Deep neural networks usually require accurate and a large number of annotations to achieve outstanding performance in medical image segmentation.
Deep learning-based medical image segmentation has shown the potential to reduce manual delineation efforts, but it still requires a large-scale fine annotated dataset for training, and there is a lack of large-scale datasets covering the whole abdomen region with accurate and detailed annotations for the whole abdominal organ segmentation.
Computed Tomography (CT) plays an important role in monitoring radiation-induced Pulmonary Fibrosis (PF), where accurate segmentation of the PF lesions is highly desired for diagnosis and treatment follow-up.
First, we present a domain composition method that represents one certain domain by a linear combination of a set of basis representations (i. e., a representation bank).
For example, STFT improves the still image baseline FCOS by 10. 6% and 20. 6% on the comprehensive F1-score of the polyp localization task in CVC-Clinic and ASUMayo datasets, respectively, and outperforms the state-of-the-art video-based method by 3. 6% and 8. 0%, respectively.
To handle this problem, we propose a hybrid supervision learning framework for this kind of high resolution images with sufficient image-level coarse annotations and a few pixel-level fine labels.
The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.
In this paper, we aim to boost the performance of semi-supervised learning for medical image segmentation with limited labels using a self-ensembling contrastive learning technique.
The study of multi-type Protein-Protein Interaction (PPI) is fundamental for understanding biological processes from a systematic perspective and revealing disease mechanisms.
To solve these problems, we propose a novel deep learning-based interactive segmentation method that not only has high efficiency due to only requiring clicks as user inputs but also generalizes well to a range of previously unseen objects.
To overcome these problems, we propose a 3D sphere representation-based center-points matching detection network that is anchor-free and automatically predicts the position, radius, and offset of nodules without the manual design of nodule/anchor parameters.
Despite the stateof-the-art performance achieved by Convolutional Neural Networks (CNNs) for automatic segmentation of OARs, existing methods do not provide uncertainty estimation of the segmentation results for treatment planning, and their accuracy is still limited by several factors, including the low contrast of soft tissues in CT, highly imbalanced sizes of OARs and large inter-slice spacing.
In this paper, we propose an annotation-efficient learning framework for segmentation tasks that avoids annotations of training images, where we use an improved Cycle-Consistent Generative Adversarial Network (GAN) to learn from a set of unpaired medical images and auxiliary masks obtained either from a shape model or public datasets.
As deep learning technologies advance, increasingly more data is necessary to generate general and robust models for various tasks.
In this paper, we propose a novel framework with Uncertainty Rectified Pyramid Consistency (URPC) regularization for semi-supervised NPC GTV segmentation.
To address this problem, we present a one-shot framework for organ and landmark localization in volumetric medical images, which does not need any annotation during the training stage and could be employed to locate any landmarks or organs in test images given a support (reference) image during the inference stage.
Inspired by Euler's Elastica model and recent active contour models introduced into the field of deep learning, we propose a novel active contour with elastica (ACE) loss function incorporating Elastica (curvature and length) and region information as geometrically-natural constraints for the image segmentation tasks.
Also, we propose a scale attention module implicitly emphasizing the most salient feature maps among multiple scales so that the CNN is adaptive to the size of an object.
Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network.
Concretely, we use a dual-task deep network that jointly predicts a pixel-wise segmentation map and a geometry-aware level set representation of the target.
In most scenarios, one might obtain annotations of a single or a few organs from one training set, and obtain annotations of the the other organs from another set of training images.
To alleviate such tedious and manual effort, in this paper we propose a novel weakly supervised segmentation framework based on partial points annotation, i. e., only a small portion of nuclei locations in each image are labeled.
Experimental results showed that our framework achieved the top performance on ISLES 2018 challenge and: 1) our method using synthesized pseudo DWI outperformed methods segmenting the lesion from perfusion parameter maps directly; 2) the feature extractor exploiting additional spatiotemporal CTA images led to better synthesized pseudo DWI quality and higher segmentation accuracy; and 3) the proposed loss functions and network structure improved the pseudo DWI synthesis and lesion segmentation performance.
Experimental results show that: (1) our proposed CNN obtains uncertainty estimation in real time which correlates well with mis-segmentations, (2) the proposed interactive level set is effective and efficient for refinement, (3) UGIR obtains accurate refinement results with around 30% improvement of efficiency by using uncertainty to guide user interactions.
The segmentation of coronary arteries in X-ray angiograms by convolutional neural networks (CNNs) is promising yet limited by the requirement of precisely annotating all pixels in a large number of training images, which is extremely labor-intensive especially for complex coronary trees.
To this end, we have developed SenseCare research platform, which is designed to facilitate translational research on intelligent diagnosis and treatment planning in various clinical scenarios.
Human-Computer Interaction Image and Video Processing
Our proposed CFEA is an interactive paradigm which presents an exquisite of collaborative adaptation through both adversarial learning and ensembling weights.
However, effective and efficient delineation of all the knee articular cartilages in large-sized and high-resolution 3D MR knee data is still an open challenge.
This study presents a starting point toward a powerful tool for automatic classification of gait disorders and can be used as a basis for future applications of Deep Learning in clinical gait analysis.
In order to address these limitations, we present tree-structured ConvLSTM models for tree-structured image analysis tasks which can be trained end-to-end.
The hierarchical attention components of the residual attention subnet force our network to focus on the key components of the X-ray images and generate the final predictions as well as the associated visual supports, which is similar to the assessment procedure of clinicians.
Finally, to reduce the memory consumption and high precision operations both in training and testing, we further quantize weights, inputs, and gradients of our localization network to low bit-width numbers.
Ranked #19 on Pose Estimation on MPII Human Pose
Generating multi-view images from a single-view input is an essential yet challenging problem.
In this paper, we introduce an interactive training method to improve the natural language conversation system for a visual grounding task.
In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.
Ranked #5 on Text-to-Image Generation on Oxford 102 Flowers
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.
Ranked #3 on Text-to-Image Generation on Oxford 102 Flowers (Inception score metric)
Multispectral pedestrian detection is essential for around-the-clock applications, e. g., surveillance and autonomous driving.
In this paper, we propose a new CNN architecture that integrates semantic part detection and abstraction (SPDA-CNN) for fine-grained classification.
In this paper, we propose a novel visual tracking framework that intelligently discovers reliable patterns from a wide range of video to resist drift error for long-term tracking tasks.
However, previous studies have rarely focused on learning a fined-grained and structured feature representation that is able to locate similar images at different levels of relevance, e. g., discovering cars from the same make or the same model, both of which require high precision.
Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting.
Face alignment, especially on real-time or large-scale sequential images, is a challenging task with broad applications.