MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation

CVPR 2024  ·  Xiaolong Deng, Huisi Wu, Runhao Zeng, Jing Qin ·

We propose a novel echocardiographical video segmentation model by adapting SAM to medical videos to address some long-standing challenges in ultrasound video segmentation including (1) massive speckle noise and artifacts (2) extremely ambiguous boundaries and (3) large variations of targeting objects across frames. The core technique of our model is a temporal-aware and noise-resilient prompting scheme. Specifically we employ a space-time memory that contains both spatial and temporal information to prompt the segmentation of current frame and thus we call the proposed model as MemSAM. In prompting the memory carrying temporal cues sequentially prompt the video segmentation frame by frame. Meanwhile as the memory prompt propagates high-level features it avoids the issue of misidentification caused by mask propagation and improves representation consistency. To address the challenge of speckle noise we further propose a memory reinforcement mechanism which leverages predicted masks to improve the quality of the memory before storing it. We extensively evaluate our method on two public datasets and demonstrate state-of-the-art performance compared to existing models. Particularly our model achieves comparable performance with fully supervised approaches with limited annotations. Codes are available at https://github.com/dengxl0520/MemSAM.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods