Yue Yang, Shiyu Liu, Shunbo Hu, Lintao Zhang, Jitao Li, Meng Li, Fuchun Zhang
Abstract: In order to improve the registration accuracy of brain magnetic resonance images(MRI), some deep learning registration methods use segmentation images for training model.However, the segmentation values are constant for each label, which leads to the gradient variation concentrating on the boundary.Thus, the dense deformation field (DDF) is gathered on the boundary and there even appears folding phenomenon.In order to fully leverage the label information, the morphological opening and closing information maps are introduced to enlarge the non-zero gradient regions and improve the accuracy of DDF estimation.The opening information maps supervise the registration model to focus on smaller, narrow brain regions.The closing information maps supervise the registration model to pay more attention to the complex boundary region.Then,opening and closing morphology networks (OC_Net) are designed to automatically generate opening and closing information maps to realize the end-to-end training process.Finally, a new registration architecture, VMseg+oc, is proposed by combining OC_Net and VoxelMorph.Experimental results show that the registration accuracy of VMseg+oc is significantly improved on LPBA40 and OASIS1 datasets.Especially, VMseg+oc can well improve registration accuracy in smaller brain regions and narrow regions.
Keywords: three dimensional (3D) medical image registration; deep learning; opening operation;closing operation; morphology
The deep learning-based registration algorithm[1, 2] uses convolutional neural networks (CNN)to estimate the dense deformation fields (DDF)between floating images and fixed images.Guha Balakrishnan et al.[3] proposed the famous unsupervised registration network VoxelMorph (VM).The intensity similarity between two to-bealigned images and the smooth regularization term of DDF are combined to construct the loss function to optimize the registration network.Subsequently, a lot of research works were carried out based on VM, such as HyperMorph [4],CycleMorph [5], TransMorph [6], etc.To further improve the registration accuracy, Hu et al.[7]proposed a weakly supervised registration framework based on a segmentation label for multimodal medical image registration.Subsequently,multi-dice similarity was presented as the loss function [8].Guha Balakrishnan et al.[9] used the segmentation label for auxiliary training to improve the registration accuracy based on the VM model.Inspired by these works, we use the global and local segmentation labels to drive the registration network [10, 11], and simultaneously improve the registration accuracy on the global and local regions.And we use the erosion and dilation information to improve the registration results [12].
However, the registration accuracy is enhanced by sacrificing the registration plausibility when the segmentation labels are directly used to optimize the registration model.The segmentation label mainly constrains the non-zero gradient at the boundary and causes the concentration and folding of DDF at these regions.Hence,a multi-supervised registration network, opening and closing morphology networks (OC_Net),is proposed to utilize the morphology map information by constructing the networks of the morphology opening and closing operation.The experimental results show that our proposed method achieves good registration performance.
With the rise of deep learning, the convolutional neural network has been widely used.It is a kind of deep neural network based on convolution operation, and is widely used in the field of image processing.
CNN mainly comprise an input layer, hidden layers, and an output layer.The input layer and the output layer are the input and output of CNN.The hidden layer consists of a convolution layer, a pooling layer, and an activation layer.The convolution layer is mainly used to extract features efficiently.The pooling layer is mainly used to select the features extracted by the convolution layer according to specific rules and reduce the feature dimension.The activation layer proposes a nonlinear map function to the selected features.Many hidden layers are connected to form a deep network.The process of image registration by a convolutional neural network can be described as inputting the floating image and the fixed image into the CNN and estimating the dense deformation field (DDF)through the output layer.
Non-rigid medical image registration aims to estimate a DDFΦ, and to align the floating imageMwith the fixed imageF, i.e.M(Φ)=F.The training process of the registration network is often defined as follows
where the functionLsimrepresents the image similarity measure between the fixed image and the deformed floating image,R(Φ) is the regularization term,θis the network parameter, andλis a hyperparameter.
The morphological opening and closing operations are applied to the deep learning registration of brain three dimensional (3D) magnetic resonance images (MRI).The opening and closing operations are viewed as a combination of morphological dilation and erosion operations.The opening operation is defined by a process of eroding firstly and then dilating a binary image.The opening purpose is to cut off narrow areas and eliminate small and sharp regions, which also has a specific smoothing effect on the boundary.The closing operation is constructed by dilating first and then eroding.The closing procedure can fuse narrow and elongated discontinuities and eliminate small cavities.
As shown in Fig.1, (a) shows a label of cerebrospinal fluid (CSF), (b) is the opening information map of CSF, and (c) is the closing information map of CSF.The regions with red colors in (b) and (c) are the obtained regions of interest from multiple opening and closing operations.
Fig.1 Opening and closing information maps based on CSF segmentation label: (a) CSF label; (b) opening information map; (c) closing information map
Opening and closing maps are obtained by opening and closing the segmentation label for many times.The opening operation can eliminate small areas, separate objects at narrow points, and smooth the boundaries of large objects.Therefore, we assign the regions removed by successive opening operations, thus promoting the information on small or narrow regions of the segmentation label.The closing operation can smooth the object contour with large curvature in the brain region.The closing information map is obtained through multiple closing operations.It should be noted that the closing operation also eliminates small holes, which leads to the error in these small hole areas.In order to solve this problem, we need to multiply it with the original segmentation label.The shade of red in (b) and(c) represents the value assigned.As can be seen from Fig.1, the opening information map pays attention to small isolated areas and narrow areas.Closing information map focuses on the boundary information of the brain regions that have high curvatures.
Fig.2 shows the detailed steps for generating opening and closing information maps for CSF label.For the convenience of illustration,only two opening and closing successive operations are performed in Fig.2.Xrepresents a binary image of CSF label,C1(X) is the result of the first closing operation onX, andC2(X) is the second closing result performed onC1(X).[C1(X)-X] is the region increased by the first closing operation, which is assigned to be (–1).[C2(X)-C1(X)] is the increased region by the s econd closing operation, which is assigned to be(–2).Similarly, when increasing the number of closing operations, [Cn(X)-Cn-1(X)] is used to denote the added area by thenth closing operation and be assigned to (–n).Finally, the closing information map is obtained by adding all assigned regions to the original segmentation label.
Fig.2 Generating opening information map and closing information image where CSF segmentation image is taken as an example
The steps for generating an opening information map are similar to those of a closing information map.O1(X) represents the result of an opening operation onX, andO2(X) is an opening operation onO1(X).Thus, [X-O1(X)]denotes the region removed by the first opening operation, which is assigned to be the value of(–1).[O1(X)-O2(X)] is the region removed by the second opening operation, which is assigned to (–2).It follows that [On-1(X)-On(X)] is the region removed by thenth closing operation,which is assigned (–n).All assigned areas are added to the original segmentation label to obtain the opening information map.
To obtain opening and closing information maps quickly, we construct the opening and closing network, OC_Net.The network architecture is shown in Fig.3.OC_Net is mainly composed of three parts: 3D opening layer, 3D closing layer,and the subtraction operation.
Fig.3 The network structure of OC_Net
As shown in Fig.3, the primary process of generating the opening information map and the closing information map is described as follows.
1) Input the segmentation label to OC_Net.
2) Open and close the segmentation label by the up and down subnetworks, respectively.
3) The variant regions are obtained by detecting the changes of the labels before and after the opening operation and the closing operation, and are assigned to an ordering value.
4) The final opening information map and closing information map are obtained by adding the assigned regions to the original segmentation label.
Fig.4 shows the proposed multi-supervised registration network, VMseg+oc, which is driven by intensity image, segmentation label, opening information map and closing information map.In the training process, the inputs of VMseg+ocare the floating imageM, the fixed imageF, the floating segmentation labelMsegand the fixed segmentation labelFseg.During the testing stage, the network only requires the inputs of two to-bealigned images, and the OC_Net net is not required.
MandFare used for unsupervised registration.The registration network outputs the deformation fieldΦ, and finallyMandΦare input into the spatial transform network (STN) to obtain the deformed floating imageM(Φ).The intensity similarity betweenFandM(Φ) is taken as an unsupervised loss as shown in part(a) of Fig.4.
MsegandΦare input into the STN to obtain a deformed floating segmentation labelMseg(Φ).The label similarity betweenFsegandMseg(Φ) is taken as the first weakly supervised loss function,as shown in part (b) of Fig.4.
FsegandMsegare input into OC_Net to obtain the floating opening information mapMseg_o, the floating closing information mapMseg_c,the fixed opening information mapFseg_o, and the fixed closing information mapFseg_c, respectively.ThenMseg_candMseg_oare transformed by the STN to obtain the deformed floating opening information mapMseg_o(Φ) and the deformed floating closing information mapMseg_c(Φ).The second weakly supervised loss function is computed betweenMseg_c(Φ) andFseg_c, betweenMseg_o(Φ) andFseg_o, respectively, as shown in part(c) in Fig.4.
The registration network structure of VMseg+ocis similar with VoxelMorph [3], mainly consisting of an encoder and a decoder.The encoder consists of 3D convolution layers and Rectified Liner Uints (ReLU) activation functions.The step size of the convolution is set to 2 to realize the down sampling.The convolution kernel size is 3×3×3.The number of channels of convolution layer in the encoder is [16, 32, 32, 32].The decoder adopts 3D convolution layers and ReLU activation functions, the size of a convolution kernel is 3×3×3, the step size of the convolution kernel is 1, and the up sampling is realized through upsampling layer.The numbers of channels of decoder are [32, 32, 32, 32, 16, 16].A hop connection is added between the encoder and the decoder to prevent feature loss during encoding and decoding progress.
Fig.4 Multi-supervised registration network VMseg+oc: (a) unsupervised registration; (b), (c) two kinds of weakly-supervised registration
The total loss function of VMseg+ocis composed of four parts: an unsupervised loss term, two weakly supervised loss terms and a regularization term.The unsupervised loss term is used to calculate the intensity similarity between the warping moving image and the fixed image, and is defined as follows
whereF(p) represents the mean value over a voxel range of 3×3×3 neighborhood with the center at pointpof the fixed image,M(Φ(p)) represents the mean value on the warping floating image, andpirepresents a voxel in the neighborhood ofp.
The weak supervision loss term includes Dice similarity coefficient (DSC) and mean square error (MSE).DSC measures the label similarity between two segmentation labels, and is defined as
MSE is used to measure the map similarity between two morphologic information maps.In this paper, we use the generatedMseg_c(Φ),Mseg_o(Φ),Fseg_c, andFseg_oto compute the second weakly supervised loss function.
The regularization term is used to ensure the topology and smoothness of DDF, which is defined as
Hence, the definition of the total loss function is shown as
whereα,β,γ,εare hyperparameters.
Experiments were performed on the open datasets, LPBA 40 and the Open Access Series of Imaging Studies (OASIS 1).The LPBA40 consists of a total of 40 3D magnetic resonance(MR) scans.The OASIS1 dataset consists of T 1-weighted MR scans from a group of 416 subjects ranging from 18 to 96 years old.The labels for both datasets include three global tissues:CSF, gray matter (GM), and white matter(WM).All images were rigidly registered by Fast Socket Library (FSL) tools.
The image size of the LPBA40 is 160 mm×192 mm×160 mm, and the number of training sets and the number of test sets are 30 and 9,respectively.The image size of OASIS1 is 160 mm×192 mm×160 mm, and the number of training sets and test sets is 365 and 50, respectively.
In the training process, the learning rate is set to 0.000 1, the optimizer selects Adam, and the number of iterations is 360.Since the image size is large, we uniformly set the batch size to 1.The four hyperparameters in the total loss function are set to 1, 1, 1.2, and 1.2, respectively.
In order to better verify the influence of the number of opening and closing operations on different brain tissues, each brain tissue was separately verified during the experiment.The experimental results are shown in Fig.5.From the DSC variance, we find that the registration effect is the best when the CSF is opened and closed 8 times.For GM region, the best registration result is obtained when the GM is opened and closed 13 times.For WM region, the best registration result is obtained when the WM is opened and closed 15 times.In Fig.5,n-axis represents the number of opening and closing operations.The following experiments in this paper are all carried out on these optimal numbers.
Fig.5 Influence of the number of opening and closing operations on the registration accuracy: (a) the times of opening operation and closing operation for CSF labels;(b) the times of opening operation and closing operation for GM labels; (c) the times of opening operation and closing operation for WM labels
In this paper, DSC is used to evaluate the registration accuracy between the warping segmentation label and the fixed segmentation label.A higher DSC represents a better registration performance.
In this paper, we compare two traditional registration methods and four deep learning registration methods.Two traditional methods are Demons [13] and Advanced Normalization Tools(ANTs) [14].Four deep learning methods are VM, VMseg, VMoc, and VMseg+oc.VM is an unsupervised registration network [9].VMsegis the registration network that uses intensity images and segmentation labels for double supervision.VMocis the double-supervised registration network using intensity images, opening and closing information maps.VMseg+ocis the registration network proposed in this paper, which is multi-supervised by intensity images, segmentation labels, opening and closing information maps.
Tab.1 and Tab.2 show the DSC registration accuracy of CSF, GM and WM by seven methods in LPBA 40 and OASIS 1 datasets,respectively.From Tab.1 and Tab.2, it can be concluded that our method is superior to the traditional registration methods and the other deep learning-based methods in terms of DSC accuracy.
Tab.1 Registration accuracy of DSC (%) on LPBA 40
Tab.2 Registration accuracy of DSC (%) on OASIS 1
For the deep learning registration method,comparing the experimental results of VMsegand VMoc, the DSC results of VMocare better than those of VMseg, since VMocuses opening information maps and closing information maps to drive the registration network.On LPBA 40 dataset,DSC improves 0.45% for CSF, 2.37% for GM,and 1.53% for WM.On the OASIS 1 dataset,DSC improves 2.31% for CSF, 2.03% for GM,and 0.19% for WM.Because the opening and closing information maps themselves have expanded some boundary information, the registration accuracy of VMocconstrained by two morphologic information maps is higher than that of VMsegconstrained only by the segmentation label.
In our VMseg+ocmethod, we use intensity information, opening information map, closing information map and segmentation label at the same time.Compared with VMseg, DSC improves 1.53% for CSF, 0.79% for GM and 2.20% for WM on LPBA 40 data.On the OASIS 1 data,DSC improves 2.99% for CSF, 2.30% for GM and 0.47% for WM.The DSC results of VMseg+ocare better than those of VMoc.In summary, the registration accuracy of VMseg+ocis higher than that of VM, VMseg, and VMoc.
At last, we compare the registered floating image and the registered floating segmentation label with affine registration, Demons and ANTs in Fig.6.We see that both the registered floating image and the registered floating segmentation label of our method are more similar to the fixed image and the fixed segmentation label,respectively.
In Fig.7, the registered image and the registered floating segmentation labels are compared among four deep learning registration methods,i.e., VM, VMseg, VMoc, and VMseg+oc.It is clear that the registered image by VMsegis more similar to the fixed image in the border region than by VM.VMocis more similar to fixed images in narrow areas, but in border areas is not as good as VMseg.Our VMseg+ocmethod ensures that the registered results are similar to the fixed image not only in the border regions but also in some narrow regions.The reason is that our method assigns different values in the border regions and narrow regions, and gives more information in these regions, and then increases the registration results.
Fig.6 The comparison of registered image and labels on LPBA 40: (a) fixed image; (b) moving image; (c) ANTs; (d)Demons; (e) VMseg+oc
Fig.7 The comparison of registered image and labels on LPBA 40: (a) fixed image; (b) moving image; (c) VM; (d)VMseg; (e) VMoc; (f) VMseg+oc
In this paper, the VMseg+ocmethod combining deep learning registration and opening and closing morphology is proposed.The opening information maps and closing information maps are quickly generated by OC_Net, whose inputs are the segmentation labels.VMseg+ocis a multi-supervised registration network with VoxelMorph as the baseline network.VMseg+occan solve the problem that the DDF driven only by segmentation label may concentrate and fold at boundary.The experimental results show that our VMseg+ocmethod has obtained better registration results at edge regions and on narrow and small regions.In addition, through the hyperparameter experiment analysis of the numbers of opening and closing operation, the proposed method still has the limitation that the optimal parameters need to be verified by experiments.In future research work, the trainable OC_Net methods can be considered to further improve the effectiveness of the registration results.
Journal of Beijing Institute of Technology2023年5期