Multi-level dilated residual network for biomedical image segmentation | Scientific Reports

Mục Lục

Finding optimal hyper-parameters using grid-search

We first performed a grid-search36 over the model hyper-parameters, including batch size, training optimizer, momentum, and learning rate scheduler; and the network architecture hyper-parameters, including depth, levels, and dilation rates, to find the optimal values for the proposed approach. The combination of the hyper-parameters in the grid-search is presented in Table 3. We found that the batch size of 4, the Adam optimizer, the momentum of 0.9, and the reduced learning rate on plateau (ReduceLROnPleateau) with an initial learning rate of 0.001; and the network architecture of depth 5, level 2, and dilation rates of [1, 3, 5] show a consistent accuracy within each model and the imaging modalities. The optimal values are then used to train the models for each dataset in a 5-fold CV and the test sets are used to evaluate the models against each fold. We initialized the convolutional layers with Xavier initialization37.

Table 3 Combination of the hyper-parameter settings and their optimal values found using grid-search in a 5-fold CV.

Full size table

Residual-of-residual skip connections (MLR blocks) improve the segmentation accuracy

To evaluate the impact of the MLR skip connection on the segmentation accuracy, we trained the proposed approach with and without inclusion of the MLR blocks into the skip connections (prior to concatenating the features from the encoder unit to the corresponding decoder unit) using the optimal hyper-parameters and the validation sets in 100 epochs.

Table 4 shows that using the MLR blocks in the MILDNet (without data augmentation) slightly improves the segmentation accuracy by on average 2% relative improvement in terms of DC, considering all the datasets. Similar performance gain is also observed, when including the MLR blocks in the baseline U-Net. Figure 6 illustrates that the predicted segmentation masks are visually more similar to the gourd-truth binary masks (Fig. 6b) and (Fig. 6f), especially in preserving the shape and the continuity in boundaries, when using the MLR blocks in the MILDNet (Fig. 6d,h) over the direct skip connections without inclusion of the MLR blocks (Fig. 6c,g) in the MRI (Fig. 6a) and dermoscopy (Fig. 6e) images. A remarkable segmentation improvement is observed in the dermoscopy example with IoU = 0.9017 using the MLR blocks (Fig. 6h) compared to IoU = 0.8374 without using the MLR blocks (Fig. 6g).

Table 4 The impact of the residual-of-residual skip connections (MLR blocks) on the segmentation accuracy using the validation sets. ↑: The higher value is better; ↓: The lower value is better.

Full size table

Figure 6

Two visual examples from the MRI15 (a) and the dermoscopy11,12 (e) images; and their corresponding ground truth masks (b,f) showing that the presence of the MLR blocks in the skip connections of the MILDNet enhances the segmentation accuracy, with the ground truths given in 6b and 6f. The predicted masks for the skip connections with the MLR blocks (d,h) preserved the continuity in the boundaries. The skip connections without the MLR blocks (c,g) resulted in the loss of some valuable information about the boundaries and the ROI shape.

Full size image

The results suggest that the presence of the MLR blocks in the skip connections improves preserving the spatial and contextual information, which is usually lost during the concatenation of the features from the encoder to the decoder units in the classical U-Net. Therefore, we incorporate the MLR blocks into the skip connections in the following experiments for the enhanced semantic segmentation.

MILDNet outperforms the classical U-Net and other baselines in segmenting the biomedical images

Table 5 compares the segmentation accuracy of the MILDNet approach with and without data augmentation against the classical U-Net, the UNet++, the MultiResUNet, the ResDUnet, and the ResidualU-Net, using the test sets of the five biomedical datasets.

Table 5 MILDNet outperforms the classical U-Net and other baselines in segmenting the biomedical images using the test sets.

Full size table

MILDNet with data augmentation has resulted in slightly superior segmentation performance compared to MILDNet without data augmentation in all except the MRI dataset, in terms of IoU. For consistency, hereafter, we choose MILDNet without data augmentation to compare segmentation results and for visual assessment. MILDNet outperforms all the baselines in segmenting the biomedical images. In particular, MILDNet consistently outperforms the classical U-Net by relative improvements of 2%, 3%, 6%, 8%, and 14%, respectively for the MRI, the ISIC-2018 dermoscopy, the GlaS-2015 histopathology, the DSB-2018 cell nuclei microscopy, and the ISBI-2012 electron microscopy biomedical images, in terms of DC. Similar performance gain is also observed in IoU and HD metrics. MILDNet also outperforms the recently proposed MultiResUNet approach by relative improvements of 1%, 1%, 1%, 4%, and 4%, respectively for the ISIC-2018 dermoscopy, the DSB-2018 cell nuclei microscopy, the ISBI-2012 electron microscopy, the MRI, and the GlaS-2015 histopathology datasets, in terms of DC. Interestingly, the ResidualU-Net approach achieves higher segmentation accuracy over the classical U-Net in all, except the MRI dataset.

Figure 7 illustrates the saliency maps of some examples from the MRI, the dermoscopy, and the histopathology datasets for all the models. From these examples, we can see that MILDNet concentrates much better on the ROIs in images with complex background as in the MRI and the histopathology datasets. For the dermoscopy images, which have better distinction between foreground and background, all models attend favorably to the ROIs.

Figure 7

Saliency maps for the MRI, the dermoscopy, and the histopathology examples. Regions that have a high impact on the models’ final decision are highlighted.

Full size image

Note that the variation observed in the relative changes from dataset to dataset may come from the segmentation challenges associated with each biomedical image modality. For example, in the ISBI-2012 electron microscopy dataset, the ROI covers the majority of the images, thus models may tend to oversegment the images. Illumination variation and different types of textures presented in the ISIC-2018 dermoscopy dataset make segmentation more difficult. For some images in the MRI dataset, it is difficult to visually identify tumors from the background due to vague ROI boundaries. In addition, brain tumors have different size, shape, and structure, which make the segmentation challenging. Similarly, irregular boundaries and structures separating the tumor and non-tumor regions in the histopathology images. In the cell nuclei microscopy dataset, some images contain bright objects, which resemble the cell nuclei (ground-truth) and may act as outliers in the segmentation. The visual assessments of the segmentation results will present some of these challenges in a later section.

We also noticed a difference between the segmentation IoU values of our proposed method with the IoU values reported in the literature. For example, the IoU values of U-Net and UNet++ for DSB-2018 in7,9 are 90.57 ± 1.26 and 92.44 ± 1.20, respectively, while in our study are 0.79 ± 0.0004 and 0.89 ± 0.0003. This variation is due to using different data-splitting protocol and the optimal hyper-parameters, and further we did not apply any post-processing techniques, such as watershed algorithm40,41, for separating the clustered nuclei.

Finally, we performed a 5-fold CV on the entire datasets by merging the training, validation, and test sets of each biomedical dataset, then, ran a simple analysis of statistical significance as t-test to check if the differences between the IoU values of the proposed and the baseline systems are statistically significant with p-value ≤ 0.05. The results in Fig. 8 show that the proposed MILDNet approach without data augmentation demonstrates a significant IoU improvements with p-value ≤ 0.05 over the classical U-Net in all except the MRI dataset, however, with a smaller standard deviation in this dataset. Similarly, the IoU differences between the MILDNet and the state-of-the-art MultiResUNet approach are statistically significant with p-value ≤ 0.05 in all except the DSB-2018 cell nuclei microscopy dataset.

Figure 8

Statistical significance for the differences in segmentation performances of the MILDNet and the baseline approaches using t-test. The differences between the IoU values of the MILDNet and the baselines are statistically significant when p-value ≤ 0.05. Y-axis represents the overall IoU value of each model using a 5-fold CV on the entire dataset by merging the training, validation, and test sets of each biomedical dataset. The sub-figures (a–e) represents the box plot with the baseline approaches U-Net, UNet + + , ResDUnet, MultiResUNet and MILDNet (proposed) on the x-axis and the IoU values on the y-axis for all the five biomedical datasets used in this work.

Full size image

Visual assessment of the segmentation results

Here, we demonstrate visual examples from the segmentation results to further compare our proposed approach with the baseline models.

MILDNet is more reliable to outline ROIs

MILDNet and the other baseline approaches perform favorably in segmenting the medical images with a clear distinction between the background and the ROIs. Figure 9 illustrates images from the ISIC-2018 dermoscopy (Fig. 9a) and the MRI (Fig. 9f) datasets with their corresponding ground truth masks (Fig. 9b) and (Fig. 9g) showing that in case of a clear distinction between the background and the foreground, the classical U-Net (Fig. 9c,h), the MultiResUNet (Fig. 9d,i), and the MILDNet (Fig. 9e,j) perform visually well to segment the ROIs close to the ground truths, however, MILDNet outperforms the other baselines in terms of the IoU in both images.

Figure 9

Segmenting a dermoscopy11,12 image (a) and an MRI15 image (f) having well-distinguished background and foreground, with (b,g) showing their corresponding ground truth segmentation masks. The classical U-Net (c,h), the MultiResUNet (d,i), and the MILDNet (e,j) performed equally well in segmenting the ROIs, close to the ground truths.

Full size image

MILDNet performs favorably in images with inconsistent foregrounds

Medical images often contain regions, which appear similar to the background, due to textural and structural similarities, irregularities, and noises. This similarity may lead to loss of information and false negative segmentation. Figure 10a shows a relevant example of such case. Although the ROI boundaries are visually separable between the tumor and the non-tumor regions (see Fig. 10b), the staining color intensity and the textures within the tumor (ROI) and non-tumor (background) appear the same in some regions, providing a challenge for the segmentation. Figure 10c shows that the classical U-Net under-segments the ROIs with IoU of 0.5083 and has missed some information about the consistencies in the foregrounds. The MultiResUNet (Fig. 10d) and the MILDNet (Fig. 10e) perform better than the classical U-Net in preserving the spatial information with IoUs of 0.8959 and 0.8996, respectively. We suggest that the use of MLR blocks allows the MILDNet to preserve the shape and the continuity of the ROIs and hence, reducing the spatial information loss during the segmentation.

Figure 10

Segmenting a histopathology16 image (a) and the ground truth mask (b), in which the foreground is not consistent all around. The same staining color intensity and textures in the tumor (ROI) appear also in some non-tumor regions (background). The MILDNet approach (e) is consistently better in segmenting this challenging image than the classical U-Net (c) and the MultiResUNet (d) approaches.

Full size image

MILDNet segments ROIs with obscure boundaries

Sometimes in the medical images, it is challenging to differentiate the ROIs from the background due to the presence of obscure boundaries. Figure 11a,f illustrate two examples, respectively from the dermoscopy and the MRI images with their corresponding segmentation masks (Fig. 11b) and (Fig. 11g), with no clear separating boundaries. The classical U-Net either over-segmented (Fig. 11c) or under-segmented (Fig. 11h) the ROIs. The MultiResUNet (Fig. 11d,i) and MILDNet (Fig. 11e,j) approaches both performed considerably better than the classical U-Net, however, both models have struggled to properly segment the ground-truths. In both examples, the MILDNet approach achieved a superior segmentation accuracy over the baseline approaches, e.g. the IoU of 0.6181 achieved by MILDNet compared to the IoU of 0.5077 achieved by the MultiResUNet in segmenting the challenging dermoscopy image illustrated in Fig. 11a.

Figure 11

Segmenting a dermoscopy11,12 image (a) and an MRI15 image (f) having no clear boundaries separating the foreground and the background, with (b,g) demonstrating the ground truth segmentation masks. The classical U-Net either over-segmented (c) or under-segmented (h) the images, while the MultiResUNet (d,i) and the MILDNet (e,j) performed considerably better in the segmentation.

Full size image

Figure 12 further illustrates an extreme case from the MRI dataset (12a) with its ground truth mask (12b), in which the ROI (tumor region) is very difficult to be identified even by a human expert. In this example, all the models (Figs. 12c,d,e) have struggled to properly segment the ROI, resulting in over-segmentation.

Figure 12

Segmenting a very challenging MRI15 image (a) having indistinguishable boundaries between the background and the foreground, with (b) being the ground-truth. All models including the proposed approach have over-segmented the image (c–e).

Full size image

MILDNet is robust against outliers

Segmenting the biomedical images often suffers from outliers, which look very similar to the ROI, but they are not a part of it. Segmentation models often fail to distinguish outliers from the ROIs. Figure 13a illustrates an example from the MRI dataset, in which the non-tumor region contains small light green areas (outliers), which resemble the tumor region (ROI) (Fig. 13b). Similarly, Fig. 13f illustrates another example from the cell nuclei microscopy dataset with a ground truth mask (Fig. 13g), in which the background has some bright particles (outliers), which are very similar to the ROI (cell nuclei). In both examples, the classical U-Net has mistakenly segmented some of the outliers, circled in red in Fig. 13c,h, as being a part of the predicted masks. The MultiResUNet (Fig. 13d,i) performed better than the classical U-Net to discard outliers, however, still mis-classified small background regions. MILDNet (Fig. 13e,j) has successfully discarded those outliers, achieving superior segmentation performance over the classical U-Net and the MultiResUNet, in terms of IoU.

Figure 13

The non-tumor region in the MRI15 image (a) contains small bright green areas (outliers), which resemble tumor region (ROI). The cell nuclei microscopy17 image (f) has also some bright particles (outliers), which are visually very similar to the cell nuclei (ROI). MILDNet successfully discarded the outliers from the predicted masks (e,j), with (b,g) being the ground truths. Red circles show the incorrectly segmented outliers by the classical U-Net (c,h) and the MultiResUNet (d,i).

Full size image

Outliers exist also in other datasets. We have observed that our proposed approach is able to robustly discard the outliers from the predicted masks. The dilated convolutions used in the encoder and the decoder units are likely to contribute towards this success by improving the localization of the ROIs, e.g. the nuclei and the tumor regions, thus, providing more reliable segmentation.

MILDNet preserves connectivity in boundaries in the majority class

Usually, ROIs occupy a definite portion of the medical images. The ISBI-2012 electron microscopy dataset provides an interesting segmentation challenge, where the majority of the images contains ROIs (e.g. in Fig. 14a with ground truth mask Fig. 14b). Segmentation models may fail to properly distinguish the foreground and the background in such images, thus, often tend to unnecessarily over-segment the images. Figure 14c shows that the classical U-Net tended to over-segment the ROIs and often missed the spatial information. MultiResUNet (Fig. 14d) and MILDNet (Fig. 14e) both have succeeded to segment the majority of the ROIs, however, MILDNet preserved more contextual information by improving the connectivity between the lines and being more immune to the noises (compare zoomed areas of the predicted masks in Fig. 14c,d,e).

Figure 14

The zoomed areas of the predicted masks in (c–e) show that the MILDNet approach can successfully preserves connectivity in boundaries in an electron microscopy13,14 image (a) with the majority of the class as being the ROI. The ground truth is given in (b).

Full size image