Table of contents
Plane specific atlas construction
RNN (Adaptive Dynamic Termination)
3D U-net (detect fetal brain landmarks)
Compare with state-of-the-art methods
Influence of Landmark Alignment Module
Analysis of Adaptive Dynamic Termination
significant difference analysis
Clinical biometric assessment from SP
Agents with Warm Start and Adaptive Dynamic Termination for Planar Localization in 3D Ultrasound [TMI 2021] [Official Code] | Intelligent agent sound pressure level AgentSPL | Feature point alignment, reinforcement learning, CNN, RNN, SP positioning |
Summary
In SP localization, the agent may fail to capture the target SP and continue exploring without a termination condition.
The action space can be expanded using further terminating actions [6]. However, expanding the operating space leads to insufficient training.
Some works terminate the proxy search by detecting oscillations [7] or lowest q-values [9]. Although no extra operations are introduced, these methods still require the agent to complete inference with the largest number of steps, which is inefficient.
During the agent search process, it is often trapped in a local minimum, so it is difficult to estimate the optimal termination step.
Therefore, in the SP localization task, it is highly desirable to adopt a dynamic termination strategy to ensure the effectiveness and efficiency of SP localization.
The newly designed adaptive dynamic termination to enhance the previous RL framework enables the agent search to stop early , saving up to 67% of the inference time, thereby improving the accuracy and efficiency of the RL framework at the same time.
method
(This article only talks about the improvement, see the previous RL framework for the unmodified place )
frame
Figure 2 Framework
The framework consists of three modules:
1) Landmark-aware alignment provides a warm start for efficient agent search (left)
2) SP positioning based on deep RL (middle)
3) Learning-based adaptive termination to improve localization efficiency and accuracy (right).
landmark-aware alignment
Figure 4 Pipeline of the landmark-aware alignment module
A landmark-aware alignment module is proposed in [8] as a dedicated warm start for the search process by dissecting prior knowledge.
A more concrete processing pipeline is detailed in this section . This landmark awareness module aligns the volume of the US to the atlas space, thereby reducing the diversity of fetal posture and ultrasound acquisition.
As shown in Fig. 4, the proposed alignment module consists of two steps, plane-specific atlas construction , test volume-atlas alignment . The details are described below.
Θ calculates the angle between the plane normal vectors, refer to the formula 5 in Section III-C.
Plane specific atlas construction
In this study, an atlas is constructed to initialize SP localization in the test volume by landmark-based registration.
Therefore, the landmark set selected from the training data set needs to contain both the flags for registration and the SP parameters for plane initialization . As shown in Fig. 4, it is suggested to select a specific atlas for each SP to improve localization accuracy. Instead of choosing a common anatomical model per SP [3]
[3] Diagnostic Plane Extraction from 3DParametric Surface of the Fetal Cranium
To ensure the validity of the initialization, ideally, the specific SP of the selected atlas should be as close as possible to the SPs of other training volumes .
Algorithm 1 shows the determination of the atlas volume for a particular plane from the training dataset based on the minimum plane error (i.e., the sum of angle and distance between two planes).
In the training phase, each volume is firstly used as an initialized agent atlas, and then landmark -based rigid registration is performed on the remaining volumes .
Based on the measured mean planar error between the linear registration plane and the standard plane ground truth for each agent atlas, the volume with the smallest error is selected as the final atlas .
Test Volume Atlas Alignment
The alignment module is based on landmark detection and matching, which is different from direct regression.
Transform landmark detection into a heatmap regression task [37] to avoid learning highly abstract mapping functions (i.e. feature representations as landmark coordinates). A custom 3D U-net [38] is trained with the l2-norm regression loss, expressed as:
Where N = 3 is the number of landmarks, and Hi and ˆHi represent the i-th predicted landmark heatmap and standard landmark heatmap, respectively.
These standard landmark heatmaps are created by placing a Gaussian kernel at the corresponding landmark location.
During inference, the test volume is passed to the landmark detector to get the predicted landmark heatmap.
The coordinate with the largest value in the landmark heatmap is selected as the final prediction .
A bounded environment is created for the agent by mapping the volume to the atlas space via a transformation matrix computed by the landmark .
In addition, the labeled target plane function of the atlas is used as the initial starting plane function of the agent .
Adaptive Dynamic Termination
Considering the sequential nature of the iterative interaction, as shown in Figure 2, an additional RNN model is used to model the mapping between the sequence of q-values and the optimal step size.
The Q value is defined as qt = {q1, qi, ..., q8}, consisting of 8 candidate actions at iteration t;
The sequence of Q values is the time series matrix Q = [q1, q2, ..., qn], where n represents the index of the iteration step.
Taking a sequence of q-values as input, the RNN model can learn the optimal termination step based on the highest angle and distance improvement (ADI).
(ADI is defined by Equation 7 in Section III-C.)
During the training process, we randomly sample subsequences from the q-value sequence as training data , and denote the highest ADI in the sampling interval as the standard slice.
Different from previous studies [8], [9], we design a dynamic termination strategy to improve the inference efficiency of the reinforcement framework.
Specifically, our RNN model performs inference every two iterations based on the current sequence of zero - padded q-values ,
Thus allowing early stopping at iteration steps with predictions for the first three repetitions .
A previous study [8] used the mean absolute error (MAE) loss function to train RNNs in the termination module.
However, it has constant backpropagation gradients and lacks measurement of fine-grained errors .
This study replaces it with a mean squared error (MSE) loss function to revisit this and target a more stable training process.
Since the standard cut plane, that is, the optimal termination step size, is usually greater than 1 (eg, 10∼75), the traditional MSE loss function may be difficult to converge due to too many gradients during training .
Using an MSE loss function with balanced hyperparameters , defined as
Among them, w is the RNN parameter, x is the input sequence of RNN, f(x;w) represents the RNN network, and G represents the optimal termination step.
The balancing hyperparameter δ = 0.01 can approximately normalize the value range of the learned steps to [0,0.75], thus simplifying the training process. The RNN model is trained using the inference results obtained from the training volume.
experiment
Dataset Setting
The proposed framework was validated using three different 3D US datasets,
Including fetal brain, fetal abdomen, uterus.
Specifically, the goal was to target three SPs:
Transventricular (TV), transthalamic (TT) and transcerebellar (TC),
fetal abdomen (AM) SP,
Sagittal (S), transverse (T) and coronal (C) SPs in the uterus.
Select 3/4 landmarks from each fetal/uterine US volume: genu and splenium of corpus callosum, center of cerebellar vermis as fetal brain volume;
The entrance of the umbilical vein, the center, and the neck of the gallbladder are the volume of the fetal abdomen;
The two endometrial uterine horns, the endometrial fundus and the fundus of the uterine wall form the uterine volume.
A dataset of 1635 prenatal 3D US volumes was collected
433 fetal brains, 519 fetal abdomens and 683 uterine US volumes
Randomly split the dataset for training, validation, and testing
Fetal brain 313, 20, 100, fetal abdomen 389, 20, 110, uterus 519, 20, 144
The average ultrasound volume size of the dataset is 270×207×235 for the fetus and 261×175×277 for the uterus
All images were resampled to an isotropic voxel size , uniform 0.5 × 0.5 × 0.5 mm3
( Isotropy means that the voxels in all directions are the same, such as the voxel space (spacing) of the file is 1mm*1mm*1mm
Anisotropy means that the voxels in each direction are different, such as the voxel space (spacing) of the file is 1mm*1mm*5mm )
Four sonographers with 5 years of experience provided manual annotation of landmarks and standard slices sp for all ultrasound volumes .
All annotation results were reviewed under strict quality control by a senior expert with 20 years of work experience.
All ultrasounds were anonymized and obtained by experts using a Mindray DC-9 ultrasound system with integrated 3D probe, with local institutional review board approval.
- train isthe training set,
- val is the test set in the training process, in order to see the training results while training, and judge the learning status in time. To verify whether it is overfitting, and to adjust training parameters, etc. There is no intersection with train andno contribution to the final trained model.
- test is the test set used to evaluate the model results after the training model is finished.
Only train can be trained, val is not necessary, and the ratio can also be set very small.
test is not necessary for model training, but generally some are reserved for testing, usually the recommended ratio is 8:1:1
But now many models do not need validation . Now the mechanism to prevent overfitting in the model has been relatively perfect, and Dropout\BN has done a good job. And in many cases, fine tuning with the original model is more difficult than starting from scratch. Therefore, everyone generally sets a number of training iterations and directly takes the final model for testing.
parameter
PyTorch
RL
The Adam optimizer [40] trains the entire framework,
DDQN was trained for epoch =100 (about 4 days),
Set the discount factor γ in the loss function (Equation 2) to 0.9. (γ∈[0,1])
The target Q network replicates the parameters of the Q network every 1500 iterations.
The maximum number of iterations was 75 for the fetal dataset and 30 for the uterus dataset, leaving enough room for movement for drug exploration.
− The initial selection strategy [17] is first set to 0.6,
Multiplied by 1.01 every 10000 iterations during training until 0.95. was trained for 100 epochs,
# =============== define training
# batch size, INT
self.batch_size = 4
# target net weight update term, INT
# 目标Q network每1500次迭代复制一次当前Q network的参数
self.target_step_counter = 1500
# learning rate, FLOAT
self.lr = 5e-5
# weight decay, FLOAT
self.weight_decay = 1e-4
# reward decay γ, FLOAT
self.gamma = 0.95
# memory capacity, INT 有优先级的Replay-buffffer
self.memory_capacity = 15000
# epsilon for the greedy, FLOAT
self.epsilon = 0.6
# =============== define default
# gpu id
self.gpu_id = 0
# total epoch
self.num_epoch = 100
# max steps(default fetal,uterus子宫30)
self.max_step = 75
RNN (Adaptive Dynamic Termination)
Using an MSE loss function with balanced hyperparameters, defined as
Among them, w is the RNN parameter, x is the input sequence of RNN, f(x;w) represents the RNN network, and G represents the optimal termination step. The balancing hyperparameter δ = 0.01 can approximately normalize the value range of the learned steps to [0,0.75], thus simplifying the training process. The RNN model is trained using the inference results obtained from the training volume.
RNN variants (original RNN and LSTM ( Long Short Term Memory ) [41] ) are trained using the mini-batch stochastic gradient descent (SGD) [ 42] optimizer ,
batch size =100, learning rate = 1e-4, moment is 0.5, epoch=100, it takes about 45 minutes.
The number of hidden units is 64, and the number of RNN layers is 2.
3D U-net (detect fetal brain landmarks)
Adam optimizer, batch size = 1, learning rate = 0.001, moment is 0.5, epoch = 40
Limited by GPU memory, the ultrasound volume is scaled to 0.4 for training.
A Gaussian map of feature points is generated as a canonical slice.
A custom 3D U-net [38] was trained with the L2-norm regression loss ( three landmarks of the fetal brain were detected , namely the genu of the corpus callosum, the splenium of the corpus callosum, and the vermis of the cerebellum, as shown in (a) three red dots ), shown as:
( Formula 5 )
Where N = 3 is the number of landmarks, and Hi and ˆHi represent the i-th predicted landmark heatmap and standard landmark heatmap, respectively.
Hyperparameters were chosen based on the validation set and several metrics were used on the holdout test set to evaluate the performance of our method.
The model was trained for each hyperparameter with different sizes and the performance on the validation dataset was evaluated.
The hyperparameter values with the best validation performance were chosen as default settings for the training phase.
In this study, three high-impact hyperparameters were searched, including the size of Replay Buffer , γ and .
Evaluation Standards
Three criteria were used to evaluate
the spatial similarity of planar positioning :
1. Dihedral angle (Ang) between two planes
np, ng represent the normal of the predicted plane and the target plane
2. The Euclidean distance difference between the two planes and the origin (Dis)
dp, dg represent the distance from the volume origin to the predicted plane, and the origin to the real plane
(Ang and Dis are based on plane sampling function, namely cos ( α ) x + cos ( β ) y + cos ( γ ) z = d ,
Effective voxel size is 0.5 mm 3 / voxel )
Content similarity :
3. Peak Structural Similarity (SSIM0) [43].
and define the ADI in iteration t as the sum of the cumulative changes in distance and angle from the starting plane , as follows
result
Table 1 is the comparison results of our proposed method and other existing methods in our fetus (mean ± std, BEST results are highlighted in bold).
Ours:RL(WSADT)
Table II: Comparison results of our proposed method with other existing methods in our uterus (mean ± std, BEST results highlighted in bold).
Table III Comparative results of ablation studies for fetal thermal onset analysis (mean ± standard disease, BEST results highlighted in bold).
Table iv Comparative results of our uterine hot start assay ablation studies (mean ± std, BEST results highlighted in bold).
Table v Comparative results of the ablation studies analyzed for termination strategies in our fetuses (mean ± std, BEST results highlighted in bold).
Table vi presents comparative results from our ablation studies analyzing uterine termination strategies (mean ± std, BEST results highlighted in bold).
Table 7. Average termination steps for adaptive dynamic termination and proactive termination
Table 8: Ablation studies for the number of layers and hidden volumes of LSTMs in the fetal brain dataset.
Table 9. P-values of pairwise t-tests between each method and our method for the three performance metrics on the fetal dataset. Bold results indicate significant differences
Table X P-values of pairwise t-tests between each method and our method for the three performance metrics in the Uterus dataset. Bold results indicate significant differences
Quantitative analysis of segmentation performance and clinical evaluation of Table xi
Figure 7. Visualization of our method on sampled SPs of an ultrasound fetal dataset. (a) is transcerebellar SP, (b) is transventricular SP, (c) is transthalamic SP, (d) is abdominal SP. For each case, the upper left corner is the predicted standard plane, the upper right corner is the true value, the lower left corner is the inferred curve of the terminated module, and the lower right corner is the predicted plane and the 3D spatial position value of the true value.
Figure 8. Visualization of our method on sampled SPs from the ultrasound uterus dataset. (a) is the midsagittal plane SP, (b) is the transverse SP, and (c) is the coronal plane SP. For each case, the upper left corner is the predicted standard plane, the upper right corner is the true value, the lower left corner is the inferred curve of the terminated module, and the lower right corner is the predicted plane and the 3D spatial location of the true value.
Compare with state-of-the-art methods
To test the effectiveness of our proposed method in standard plane localization, we conduct comparative experiments with a classical learning-based regression method, denoted as Regression, the state-of-the-art automatic view planning method [9], denoted as AVP, and Our previous method [8], denoted RL-US. To achieve a fair comparison, we used the default flat initialization strategies for both regression and AVP, and retrained all two compared models using a common implementation. We also tuned the training parameters to achieve the best localization results. As shown in Table I and Table II, it can be observed that our method achieves the highest accuracy on almost all metrics. This demonstrates the superior capability of our method in standard planar localization tasks.
Influence of Landmark Alignment Module
To verify the impact of the landmark-aware alignment module of the proposed method, we compare the performance of the framework with and without it. In the pre-regression approach, we set agents with random priming. Plane functions like [9] and choose the lowest q-value [9] as the termination step. The Regist method indicates that the alignment module is equipped, but there is no framework for agent search. The post-regression approach represents search results for agents initialized with warmup using the alignment module. We also chose the lowest q-value termination strategy to implement the post-regression strategy for a fair comparison. As shown in Table III and Table II, the accuracy of the prediction method is significantly lower than that of the prediction method and the post-forecast method. This demonstrates that the landmark-aware alignment module can consistently improve plane detection accuracy. Figure 5 provides the three-dimensional spatial distribution of fetal brain landmarks before and after alignment. It can be observed that all landmarks are mapped to a similar spatial location, suggesting that all fetal poses are roughly aligned
Analysis of Adaptive Dynamic Termination
To demonstrate the impact of the proposed Adaptive Dynamic Termination (ADT) strategy, we conduct comparative experiments with existing popular strategies such as termination using the maximum iteration (Max-Step), lowest Q-value (Low-Q-value [9] ) and active termination using LSTM [8] (AT-LSTM). We also compare our proposed ADT with different backbone networks, including multi-layer perceptron (ADT-MLP), vanilla RNN (ADT-RNN) and LSTM (ADT-LSTM). The superscript ∗ indicates that the model uses a normalized MSE loss function (LMSE, Equation 4). As shown in Table V and Table VI, after adopting an adaptive dynamic termination strategy, the agent can avoid falling into a poor local minimum and obtain better performance. Moreover, from Table VII, we can observe that our proposed dynamic termination can save about 67% of the inference time at most, thus improving the efficiency of the reinforcement framework.
significant difference analysis
To investigate whether the differences between different methods are statistically significant, we performed paired t-tests on the results of our method against regression, AVP [9], and registration. These tests were carried out for all performance indicators. Includes angle, distance and SSIM. We set the significance level at 0.05. The results are shown in Tables IX and x. The comparison and inspection results in Tables I-IV and IX-X show that our method performs best among existing methods (regression, AVP [9]) and registration. Although our method outperforms AT-LSTM [8] without significant difference, our method can save up to 67% inference time, as shown in Table 7.
Clinical biometric assessment from SP
In this section, we further explore whether the detected planes can provide consistent accurate biometrics with those obtained by artificially obtained planes, which is a more clinical concern. To obtain data on the prediction planes (TT and AM), we segment the fetal head and abdomen using the pre-trained DeepLabv3+ [44]. Two minimal ellipses of the fetal head or abdominal circumference and the ground truth marked on the target plane are then generated. We evaluate the performance of biometrics using three metrics, including dice score (Dice), absolute error (AErrorr) and relative error (R-Errorr). As shown in Table 11, the proposed method achieves good performance on Dice scoring. At the same time, the absolute error and relative error of fetal head circumference and abdominal circumference of this method are 1.125mm, 2.05% and 3.608mm, 3.25%, respectively. The p-values in Table XI also show that our predicted biometrics are not significantly different from annotations. This shows similar performance to human-level performance [45], [46] and shows that the proposed method has the potential to be applied in real clinical settings.
Qualitative evaluation
Figure 7 and Figure 8 provide visualization results of the proposed method. It shows predicted planes, true values, termination curves and 3D space visualization for four randomly selected cases. It can be observed that the predictions are spatially close and visually similar to the ground truth. Furthermore, the method can consistently reach an ideal stopping point. Neither the maximum iteration termination strategy nor the lowest Q value termination strategy can find the optimal termination step.
While RL was effective in localizing the field of view plane in MRI [9], it failed to localize the localization of SPs in 3D US. Without the alignment module and early stopping settings, AVP needs to perform agent training and inference in a huge search space. Therefore, it is easier for learning-based localization methods to localize SPs within a limited search space. This might explain the relatively low performance of [9] in Tables III and IV. The proposed landmark-aware alignment module is designed based on the exact concern. It aligns all volumes to the same atlas space using rigid registration, which constrains the environment as in MRI images. Furthermore, our proposed alignment method can be seen as based on pre-agent initialization when testing US volumes, which reduces the search space to a fine-grained subspace.
In deep RL, an appropriate termination strategy is essential, while during iterative search, the agent is often trapped in local minima, making it difficult to estimate the optimal termination step. Previous studies [7], [9]. Several different termination strategies were proposed. However, as shown in Tables V and VI, Figure 7 and Figure 8, the above experiments or previous knowledge-based termination strategies failed to estimate the optimal termination step in this challenging task. Meanwhile, previous studies [9], [8] default the agent to terminate at a fixed maximum step, leading to inefficiencies in the localization system. Our previous study designed a learning-based active termination using RNNs to learn the mapping between q-value sequences and optimal steps. However, it also needs to wait for the agent to complete inference. In contrast, our termination module allows an implicit relationship between the learned q-value curve and the optimal termination step for dynamic agent search using RNNs. The resulting RL framework enables more accurate and efficient predictions. Note that this learning-based termination strategy is a general approach that can be applied to other similar tasks.