Multiple Landmark Detection using Multi-AgentReinforcement Learning Multiple Landmark Detection based on Multi-Agent Reinforcement Learning

Table of contents

Summary

introduce

contribute

related work

method

Collaborative Agent

experiment

data set

train

test

discuss

computing performance

in conclusion

References


Summary

The detection of anatomical landmarks is an important step in medical image analysis and applications for diagnosis, interpretation, and guidance.

Manually annotating landmarks is a tedious process that requires domain-specific expertise and introduces inter-observer variability. This paper proposes a multi-agent reinforcement learning based multi-landmark detection method.

Our hypothesis is that in human anatomy, the positions of all anatomical landmarks are interdependent and non-random, so finding one landmark can help infer the positions of other landmarks.

Leveraging a Deep Q-Network (DQN) architecture, we build an environment and agents with implicit mutual communication, such that we can accommodate K agents acting and learning simultaneously while they try to detect K different landmarks.

During training, agents cooperate collectively by sharing their accumulated knowledge.

We compare our method to state-of-the-art architectures, requiring fewer computational resources and less training time than a 50% reduction in detection error. Code and visualizations are available: https://github. com/thanosvlo/MARL-for-Anatomical-Landmark-Detection

introduce

Precise localization of anatomical landmarks in medical images is a key requirement for many clinical applications, such as image registration and segmentation and applications in computer-aided diagnosis and intervention.

For example, in order to plan cardiac interventions, it is necessary to determine standardized planes of the heart, such as the short-axis and 2/4 compartment views [1].

It also plays a crucial role in prenatal fetal screening, where it is used to estimate biometric measures, such as fetal growth rate, to identify pathological development [17].

In addition, the midsagittal plane, commonly used for brain image registration and assessment of abnormalities, is identified based on landmarks such as the anterior coaptation (AC) and posterior coaptation (PC) [2].

Manually annotating landmarks is often a time-consuming and tedious task that requires extensive anatomical expertise and is subject to intra- and intra-observer errors.

On the other hand, the design of automatic methods is also challenging because the appearance and shape of different organs vary greatly.

contribute

This work proposes a novel multi-agent reinforcement learning (MARL) approach to efficiently and simultaneously detect multiple landmarks by sharing the experience of an agent.

The main contributions can be summarized as:

  (i) We introduce a new formulation of the multiple landmark detection problem in the MARL framework;

(ii) propose a new collaborative deep quantum network (DQN) for training using implicit communication between agents; (iii) extensively evaluate on different datasets and compare with a recently published method (Decision Forests, Convolutional Neural Networks (CNNs), and Single-Agent RL) for comparison.

related work

In the literature, automatic landmark detection methods employ machine learning algorithms to learn a combination of appearance-based and image-based models, such as using regression forests [16] and statistical shape priors [6].

[19] propose to use two CNNs for landmark detection; the first network learns search paths by extracting candidate locations, and the second network learns to recognize landmarks by classifying candidate image patches.

Li et al. [13] proposed a patch-based iterative CNN that can detect single or multiple landmarks simultaneously.

Ghesu et al. [8] introduce a single deep RL agent to navigate to target landmarks in 3D images.

Artificial agents learn to efficiently search and detect landmarks in RL scenarios. This search can be performed using a fixed or multi-scale step strategy [7].

Alansary et al. [2] proposed to use a different deep q-network (DQN) architecture for landmark detection using a new hierarchical action step.

The agent learns an optimal policy to navigate a goal landmark from any starting point using sequential action steps in a 3D image (environment).

In [2], reported experiments show that this method can achieve state-of-the-art results for detecting multiple landmarks from different datasets and imaging modalities. However, this approach is designed to learn a single agent for each landmark separately.

In [2], it is also shown that the performance of different strategies and structures strongly depends on the anatomical location of the target landmarks. Therefore, we hypothesize that sharing information when trying to detect simultaneously reduces the above-mentioned dependencies

method

Building on the work of [8] and [2], we extend the formulation of landmark detection as a Markov decision process (MDP), in which an artificial agent learns an optimal policy for its target landmarks, which defines a concurrent part Observable Markov decision process (co-POMDP) ​​[9].

We consider our framework to be concurrent in that the agents are trained together, but each learns its own individual policy, mapping its private observations to an individual action [10].

We hypothesize that this is necessary because localization of different landmarks requires learning partially heterogeneous policies. This is not possible for the application of centralized learning systems.

Our RL framework is defined by the state of the environment, the behavior of agents, their reward functions, and terminal states.

We consider the environment to be a 3D scan of human anatomy, and define a state as a region of interest (ROI) centered on the agent's location. This makes our formulation a POMDP, since the agent only sees a subset of the environment [11].

Figure 1: (a) single agent and (b) multiple agents interacting in an RL environment

We define a frame history as consisting of four ROIs.

In this setup, each agent can move along x, y, z axes, creating a set of six actions.

In our multi-agent framework, each agent computes its individual reward when their strategies are disjoint.

During training, we consider the search to have converged when the agent arrives within 1mm of the target landmark.

Situational games are introduced in both training and testing.

During training, an episode is defined as the time the agent needs to find the landmark, or until they have completed a predefined maximum number of steps.

If an agent discovers its milestones before all others, we freeze training and disable network updates derived from that agent, while allowing other agents to continue exploring the environment.

During testing, we terminate the event when the agent starts oscillating around a position or exceeds the defined maximum number of frames.

Collaborative Agent

Previous approaches [2], [7], and [8] to the problem of landmark detection consider a single agent to find a single landmark.

This means that further landmarks need to be trained with separate instances of the agent, making large-scale applications infeasible.

Our hypothesis is that in human anatomy, the positions of all anatomical landmarks are interdependent and non-random, so finding one landmark can help infer the positions of other landmarks.

This knowledge is not exploited when using isolated proxies.

Therefore, to reduce the computational load of locating multiple landmarks and improve accuracy through anatomical interdependence, we propose a collaborative multi-agent landmark detection framework (Collab-DQN).

For simplicity of representation, the following description will assume only two agents. However, our method can be extended to K agents. In our experiments, we use two, three and five agents trained together for evaluation.

Figure 2: Proposed collaborative DQN in the case of two agents; the convolutional layers and corresponding weights are shared among all agents, making them part of the Siamese architecture, while the fully connected layers of each agent are independent

A DQN consists of three convolutional layers interleaved with max pooling layers, followed by three fully connected layers. Inspired by the Siamese architecture [3], in our Collab-DQN, we build K DQN networks whose weights are shared among convolutional layers

The fully connected layers remain independent as these will make the final action decision. In this way, the information required for browsing the environment is encoded into the shared layers, while the landmark-specific information is kept in the fully connected layers. In Figure 2, we graphically represent the proposed architecture for two agents. Sharing weights between convolutional layers helps the network learn broader features that can fit two inputs while adding implicit regularization to the parameters to avoid overfitting. The shared weights enable indirect knowledge transfer in the parameter space between agents, thus, we can consider this model as a special case of collaborative learning [10].

experiment

data set

We evaluate our proposed framework and model on three tasks:

  (i) Brain MRI landmark detection with 728 training and 104 detection volumes [12];

(ii) Cardiac MRI landmark detection with 364 training and 91 detection volumes [14];

(iii) Fetal brain ultrasound landmark detection for 51 training and 21 testing volumes.

Each pattern includes 7–14 anatomically realistic landmark locations, annotated by expert clinicians [2].

train

During training, an initial random location is chosen from the inner 80% of the volume to avoid sampling outside meaningful regions.

The initial ROI is 45 × 45 × 45 pixels around a randomly selected point.

The agents follow a -greedy exploration strategy, every few steps they choose a random action from the uniform distribution, and in the remaining steps they act greedily. Episodic learning, and adding freezing action updates for agents that have reached a terminal state until the end of the episode, see Section 2 for details

Table 1: Results of brain MRI and fetal brain ultrasound in millimeters. Our proposed Collab DQN performs better in all cases, except CSP, where we match the performance of a single agent.

test

For each agent, we fixed 19 different starting points to allow a fair comparison between different methods. These points are used for 25% of each pattern, 50% of volume size and 75% of all test volumes.

For each volume, the Euclidean distance between the end position and the target position of every 19 runs was averaged. The average distance in mm is considered the performance of the agent in a specific volume. Multiple tests have been performed using our proposed architecture.

The performance is compared with multi-scale RL landmark detection [7], fully supervised deep convolutional neural network (CNN) [13] and single-agent DQN landmark detection algorithm [2]. In terms of cardiac markers, we compare with [16] using decision forests. Different DQN variants such as Double DQN or Duelling DQNs were not evaluated because their performance showed little improvement on the anatomical landmark detection task shown in [2].

Although our method can be extended to K agents where sufficient computational power is available  , we restrict our comparison to the anterior commissure (AC) and posterior commissure (PC) of the brain; the apex (AP) and micus of the heart Valvular center (MV); right cerebellum (RC), left cerebellum (LC), and lucid space (CSP).

These are common, diagnostically valuable landmarks used in clinical practice and previously automated landmark detection algorithms. For completeness and ease of future comparisons, we also provide performance comparisons for training 3 and 5 agents simultaneously. In Table 1, we show the performance of brain MRI and fetal brain US markers using different methods. In Table 2, we show the results for 3 and 5 agents trained simultaneously and for cardiac MRI markers.

discuss

As shown in Tables 1 and 2, our proposed method significantly outperforms the current state-of-the-art in landmark detection. A paired student-t test p-value from all experiments ranged from 0.01 to 0.0001. We conduct ablation studies by training a single agent on double-iterated instances and double the batch size. The study was performed on cardiac MRI landmarks that showed the greatest localization difficulties because of greater anatomical variation between subjects than was observed in brain data. Our results confirm that agents share essential information, which helps them perform tasks more efficiently.

Our hypothesis is that regularization effects gleaned from the experience and knowledge of multi-agent systems are beneficial.

Furthermore, we created a single agent with doubled memory, but the agent was unable to learn due to the random initialization of the experience memory.

Furthermore, including more agents leads to similar or improved results for all markers, as shown in Table 2(a).

It is worth noting that our method can only match the performance of single-agent DQN for CSP landmarks, although we perform better in all landmarks.

Our theory is that this is due to the different anatomical properties of the RC, LC and CSP landmarks, and therefore no advantage in combined detection.

In this paper, we choose to use DQN instead of existing policy gradient methods such as A3C because DQN is represented by a single deep CNN that interacts with a single environment.

A3C uses many proxy instances that interact asynchronously and in parallel.

Multiple A3C agents and such environments are computationally expensive.

In future work, we will investigate the application of multi-landmark detection using cooperative or competitive agents.

computing performance

Simultaneously training multiple agents not only benefits the performance of landmark localization, but also reduces training time and memory requirements.

Sharing weights between convolutional layers helps reduce trainable parameters by 5% compared to parameters for two and three independent networks, and 6% for each of the three agents.

Furthermore, adding a single agent in our architecture reduces the number of parameters required by 6% compared to a single stand-alone agent.

Due to positive regularization effects, multiple agents during their training and implicit knowledge transfer, our method requires an average training time of 25.000-50.000 less time steps to converge with a DQN and each training epoch takes about 30 minutes less than two epochs are trained on a single DQN (NVIDIA Titan-X, 12 GB). Inference with a single agent at ∼20fps.

 Table 2: (a) Multi-agent performance, training, and testing in brain MRI; markers 3, 4, and 5 represent the lateral, inferior apex, and medial sides of the splenium of the corpus callosum, respectively; (b) Multi-agent performance on cardiac MRI datasets Performance;

in conclusion

This paper formulates the problem of multiple anatomical landmark detection as a multi-agent reinforcement learning scenario and introduces CollabDQN , a collaborative DQN for landmark detection in brain and cardiac MRI volume and 3D ultrasound .

Together we train K agents to find K landmarks. These agents share their convolutional layer weights.

In this way, we leverage the knowledge transferred by each agent to teach other agents.

Performance compared to the suboptimal approach [2] while taking less time and less memory than continuously training K agents.

We believe that Bayesian exploration methods are a natural next step, which will be addressed in future work.

Brain MRI: adni.loni.usc.edu,

Ultrasound data: only after informed consent, approved and formal data sharing agreement.

Caridaq data: digital-heart.org.

References

1. Alansary, A., Le Folgoc, L., Vaillant, G., Oktay, O., Li, Y., Bai, W., Passerat
Palmbach, J., Guerrero, R., Kamnitsas, K., Hou, B., McDonagh, S., Glocker, B.,
Kainz, B., Rueckert, D.: Automatic View Planning with Multi-scale Deep Rein
forcement Learning Agents. In: MICCAI 18. pp. 277–285 (2018)
2. Alansary, A., Oktay, O., Li, Y., Folgoc, LL, Hou, B., Vaillant, G., Kamnitsas,
K., A. Vlontzos, B. Glocker, B. Kainz, D. Rueckert: Evaluating reinforcement
learning agents for anatomical landmark detection. Medical Image Analysis 53,
156–164 (2019)
3. Bromley, J., Guyon, I., LeCun, Y., S¨ackinger, E., Shah, R.: Signature verifification
using a ”siamese” time delay neural network pp. 737–744 (1993)
4. Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate
with deep multi-agent reinforcement learning. In: NIPS 29. pp. 2137–2145 (2016)
5. Foerster, J., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch,
I.: Learning with opponent-learning awareness. In: Proc. 17th Intl. Conf. on Au
tonomous Agents and MultiAgent Systems. pp. 122–130. AAMAS ’18 (2018)
6. Gauriau, R., Cuingnet, R., Lesage, D., Bloch, I.: Multi-organ localization with
cascaded global-to-local regression and shape prior. Medical Image Analysis 23(1),
70 – 83 (2015)
7. Ghesu, F., Georgescu, B., Zheng, Y., Grbic, S., Maier, A., Hornegger, J., Co
maniciu, D.: Multi-scale deep reinforcement learning for real-time 3d-landmark
detection in ct scans. IEEE PAMI 41(1), 176–189 (Jan 2019)
8. Ghesu, FC, Georgescu, B., Mansi, T., Neumann, D., Hornegger, J., Comaniciu,
D.: An artifificial agent for anatomical landmark detection in medical images. In:
MICCAI 2016. pp. 229–237. Springer, Cham (2016)
9. Girard, J., Emami, R.: Concurrent markov decision processes for robot team learn
Eng. EAAI (2015)
10. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using
deep reinforcement learning. In: Autonomous Agents and Multiagent Systems. pp.
66–83. Springer (2017)

Guess you like

Origin blog.csdn.net/qq_28838891/article/details/127060229