18 Jul 2019

UM Data Science Research Seminar

The UM Data Science Research Seminar Series are monthly sessions organised by the Institute of Data Science, on behalf of the UM Data Science Community, in collaboration with different departments across UM with the aim to bring together data scientists from Maastricht University to discuss breakthroughs and research topics related to Data Science.

This session is organised in collaboration with The D-Lab on 18 July, 2019 from 12:00 - 1:00pm.


12.00 - 12.30

Talk by Yvonka van Wijk

Title: Radiomics for the classification of breast lesions on contrast enhnaced mammography


Radiomics extracts large amounts of information from medical images, in tumor classification on contrast-enhanced spectral mammography (CESM). This could reduce unnecessary biopsies and overtreatment. We hypothesized that radiomics features extracted from delineated masses can help differentiate between benign and malignant tumors as determined by pathology.

Methods: A CESM is made by recombining a low- and high-energy mammogram made in rapid succession after injection of an iodine contrast agent. Delineated images, clinical parameters and pathology were available for 1065 patients. The five most relevant radionics features were selected from 843 features extracted from the delineated masses via correlation bias reduction and recursive feature elimination. A random forest model on radiomics features only, clinical features and combined clinical and radiomics. The model output was compared to the gold standard pathology. The dataset was split into training (70%) and validation (30%), and model performance was evaluated as the area under curve (AUC) of the receiver operator characteristic (ROC) curve.

Results: Malignant lesions were found in 534‬ patients. The most predictive features involved mean, median and 90th percentile of intensity. On the validation set, the model performed with an AUC of 0.91 with just radiomics features, 0.75 with just clinical features, and 0.93 for a combination between the two. The positive predictive value of the combined model was 0.80, and the negative predictive value was 0.89. Conclusion: The model was highly predictive in distinguishing between benign and malignant. Future studies will include validation on external data.

See presentation slides here.

12.30 - 13.00

Talk by Sergey Primakov

Title: Fully automated pipeline for Non-Small Cell Lung Cancer segmentation on the CT images with RECIST functionality


Introduction: Contouring cancers of lung cancer patients is essential for radiotherapy administration and personalized treatment. However, manual contouring of the gross tumor volume (GTV)  on medical images of lung cancer patients is highly laborious and time consuming. Moreover, manual contouring is prone to inter-observer variability and has poor reproducibility. In order to overcome these shortcomings, we created a fully automated pipeline for segmenting the GTV regions on computed tomography (CT) images using deep learning.

Methods: 644 lung cancer patients from 3 centers: 420 patients from Maastro-CT-Lung-1 dataset, 40 patients from UCL-CT-Lung dataset, 101 patients from UCSF-CT-Lung dataset and 83 patients from TCIA-CT-Lung3-Genomics dataset were randomly divided to training and test datasets with 613 patients and 31 patients respectively. Rider-CT multiple delineation dataset containing 20 patients was used as external validation. Contours created by doctors were considered as ground truth in further evaluations.

A three-step approach was developed, consisting of data preprocessing, lung extraction and tumor segmentation. To combine data from multiple medical centers and use it with the CNN approach we developed the data preprocessing routine. Ct scans in the resulting dataset were reconstructed with different protocols and therefore have various image specific meta information and spatial resolution. In order to overcome this diversity and unify images, we utilized grey level mapping using lung window, normalize spatial resolution of image voxels and normalize intensity values. To localize the region of interest and minimize the amount of processing information we implemented lung extraction algorithm. Furthermore, it allows us to use CT scans with different length, and the whole body scan can be used as an input image, see figure.1. In the tumor segmentation step of our pipeline, we combine mask prediction part and final reconstruction part. In order to consider the volumetric character of the CT scan, the mask prediction part is represented by 3 parallel 2D U-net type CNN for every spatial plane. These CNNs generate the masks for every slice contained in the projection array. Afterward masks consolidating into the label volumes by applying connected component extraction to each projection array respectively. As the reconstruction part, we utilize a 3D CNN for selecting the best ensemble of the label masks and generating the final GTV.
Performance of the proposed pipeline was evaluated by comparing the generated volumes with the ground truth volumes using DICE similarity coefficient. In addition to automatic segmentation of GTV we implemented the RECIST and volumetric RECIST functionality.

Results: A mean DICE coefficient of 0.71 was achieved on the evaluation set, and 0.75 on the external validation set. Performance of the CNN was evaluated using 2D DICE and Jaccard metrics. .

Conclusion: Proposed pipeline could potentially provide a low-cost, observer-independent and reproducible segmentation of GTV regions on CT lung images. Moreover, it can be used for automated tumor response evaluation using RECIST or volumetric RECIST.

For questions or concerns, please contact us via info-ids[at]maastrichtuniversity[dot]nl.


The D-Lab

  • Philippe Lambin
  • Yvonka van Wijk
  • Sergey Primakov
  • Janita van Timmeren
  • Henry Woodruff
  • Rianne Herben