Segment as You Wish--Free-Form Language-Based Segmentation for Medical Images

Abstract

Medical imaging is crucial for diagnosing a patient's health condition, and accurate segmentation of these images is essential for isolating regions of interest to ensure precise diagnosis and treatment planning. Existing methods primarily rely on bounding boxes or point-based prompts, while few have explored text-related prompts, despite clinicians often describing their observations and instructions in natural language.

To address this gap, we first propose a RAG-based free-form text prompt generator, that leverages the domain corpus to generate diverse and realistic descriptions. Then, we introduce FLanS, a novel medical image segmentation model that handles various free-form text prompts, including professional anatomy-informed queries, anatomy-agnostic position-driven queries, and anatomy-agnostic size-driven queries.

Additionally, our model also incorporates a symmetry-aware canonicalization module to ensure consistent, accurate segmentations across varying scan orientations and reduce confusion between the anatomical position of an organ and its appearance in the scan.

FLanS is trained on a large-scale dataset of over 100k medical images from 7 public datasets. Comprehensive experiments demonstrate the model's superior language understanding and segmentation precision, along with a deep comprehension of the relationship between them, outperforming SOTA baselines on both in-domain and out-of-domain datasets.

Anatomy-Informed Segmentation - Expert

EMR snippet 1 (pseudonymized)

EMR snippet 2 (pseudonymized)

Anatomy-Informed Segmentation - Normal Query

Normal Query 1

Normal Query 2

Normal Query 3

Normal Query 4

Anatomy-Agnostic Segmentation - Positional Info

Sample 1: ask for the visible smallest organ.

Sample 2: ask for the visible largest organ

Corner Cases

Related Links

Dataset Links we have used:

1. Pancreas-CT This dataset consists of 82 contrast-enhanced abdominal CT volumes, only provides the pancreas label annotated by an experienced radiologist, and all CT scans have no pancreatic tumor.

2. LiTS contains 131 and 70 contrast-enhanced 3-D abdominal CT scans for training and testing, respectively. The data set was acquired by different scanners and protocols at six different clinical sites, with a largely varying in-plane resolution from 0.55 to 1.0 mm and slice spacing from 0.45 to 6.0 mm.

3. KiTS It includes 210 training cases and 90 testing cases with annotations provided by the University of Minnesota Medical Center. Each CT scan has one or more kidney tumors.

4. AbdomenCT-1K It consists of 1112 CT scans from five datasets with liver, kidney, spleen, and pancreas annotations.

5. CT-ORG It is composed of 140 CT images containing 6 organ classes, which are from 8 different medical centers. Most of the images exhibit liver lesions, both benign and malignant.

6. CHAOS It provides 20 patients for multi-organ segmentation. All CT scans have no liver tumor.

7. MSD CT It includes liver, lung, pancreas, colon, hepatic vessel, and spleen tasks for a total of 947 CT scans with 4 organs and 5 tumors.

8. BTCV It consists of 50 abdominal CT scans from metastatic liver cancer patients or post-operative ventral hernia patients. They are collected from the Vanderbilt University Medical Center.

9. AMOS22 It is the abbreviation of the multi-modality abdominal multi-organ segmentation challenge of 2022. The AMOS dataset contains 500 CT with voxel-level annotations of 15 abdominal organs.

10. WORD It collects 150 CT scans from 150 patients before the radiation therapy in a single center, all of them are scanned by a SIEMENS CT scanner without appearance enhancement. Each CT volume consists of 159 to 330 slices of 512 × 512 pixels.

11. 3D-IRCADb It contains 20 venous phase enhanced CT scans. Each CT scan has various annotations, and only annotated organs are tested to validate the model’s generalizability.

12. TotalSegmentator It collects 1024 CT scans randomly sampled from PACS over the timespan of the last 10 years. The dataset contains CT images with different sequences (native, arterial, portal venous, late phase, dual-energy), with and without contrast agent, with different bulb voltages, with different slice thicknesses and resolution and with different kernels.

12. FLARE22Train It is a dataset used as the labeled training set in MICCAI FLARE 2022 Challenge https://flare22.grand-challenge.org/.

BibTeX

@article{da2024segment,
      title={Segment as You Wish--Free-Form Language-Based Segmentation for Medical Images},
      author={Da, Longchao and Wang, Rui and Xu, Xiaojian and Bhatia, Parminder and Kass-Hout, Taha and Wei, Hua and Xiao, Cao},
      journal={arXiv preprint arXiv:2410.12831},
      year={2024}
    }

Segment as You Wish: Free-Form Language-Based Segmentation for Medical Images

The Demo for two types of free form language prompt segmentations.

Architecture The architecture of the FLanS for Free Form language-prompted segmentation on Medical Images