Summary of medical image data reading and preprocessing methods

231ca03367582e02c9b115cf7f9aaa64.png

来源:深度学习爱好者
本文约2000字,建议阅读9分钟
本文主要介绍常见的医学图像读取方式和预处理方法。

Personally, I think that, for example, in the direction of medical image segmentation, and more specifically, abdominal organ segmentation or liver tumor segmentation, you need to master two aspects of knowledge:

(1) Medical image preprocessing methods;

(2) Deep learning knowledge

The first point is a necessary condition for the second point, because you need to know what kind of data is input into the DL network. This article mainly introduces common medical image reading methods and preprocessing methods.

In the past two days, I have reviewed the reading and preprocessing methods of medical image data, and I will summarize them here.

To analyze medical image data based on deep learning, such as lesion detection, tumor or organ segmentation, the first step is to have a general understanding of the data. But when I first started medical image segmentation, I was very confused and didn't know what I should do, what knowledge I needed to prepare, and only now did I establish a simple knowledge system. In my opinion, for example, in the direction of medical image segmentation, and more specifically, abdominal organ segmentation or liver tumor segmentation, two aspects of knowledge are required: (1) medical image preprocessing methods; (2) deep learning knowledge. The first point is a necessary condition for the second point, because you need to know what kind of data is input into the DL network.

This article mainly introduces common medical image reading methods and preprocessing methods.

1. Medical image data reading

1.1 ITK-SNAP software

First, let me introduce the medical image visualization software ITK-SNAP, which can be used as a tool to intuitively experience the 3D structure of medical images, and can also be used as a segmentation and detection frame labeling tool. It is free and easy to use. Amway: ITK-SNAP official download address : http://www.itksnap.org/pmwiki/pmwiki.php. In addition, mango (http://ric.uthscsa.edu/mango/) is another very lightweight visualization software, you can also try it. I generally use ITK-SNAP.

For how to use ITK-SNAP, you can refer to this blog post by the boss, which is very concise:

JunMa: Getting started with ITK-SANP:

https://zhuanlan.zhihu.com/p/104381149

8e90e98e0aad3752fe29facd58831537.jpeg

ITK-snap interface

First of all, it is necessary to clarify the direction corresponding to the human body. The three windows correspond to the three sections. The corresponding relationship is shown in the figure below, and the alphabetical index can be used. For example, the upper left picture corresponds to the surface of RALP, which is the cut plane viewed from the sole of the foot to the head (ie the z direction), and the other two pictures are similar.


ddc664ebe617e81aa28a56472ff0081f.jpeg

The red section is the sagittal plane, the purple section is the coronal section, and the green section is the transverse section

You can also import the segmentation results at the same time for comparison and observation.

dec451acc033fb35163e19d6a2264e31.jpeg

It can also be fine-tuned for places where the labeling is not rigorous. Of course, most of the public collections are pretty good. Labeling yourself is similar. (If the display is not clear and the contrast is too low, you need to adjust the window width and window level in the software)

1.2 SimpleITK

We know that the most common medical images are CT and MRI, which are three-dimensional data, which is more difficult than two-dimensional data. And the saved data also has many formats, the common ones are .dcm .nii(.gz) .mha .mhd(+raw). These types of data can be processed with Python's SimpleITK, and pydicom can read and modify .dcm files.

The purpose of the read operation is to extract tensor data from each patient data. Use Simpleitk to read the above .nii data as an example:

 
  
import numpy as np
import os
import glob
import SimpleITK as sitk
from scipy import ndimage
import matplotlib.pyplot as plt  # 载入需要的库


# 指定数据root路径,其中data目录下是volume数据,label下是segmentation数据,都是.nii格式
data_path = r'F:\LiTS_dataset\data'
label_path = r'F:\LiTS_dataset\label'  


dataname_list = os.listdir(data_path)
dataname_list.sort()
ori_data = sitk.ReadImage(os.path.join(data_path,dataname_list[3])) # 读取其中一个volume数据
data1 = sitk.GetArrayFromImage(ori_data) # 提取数据中心的array
print(dataname_list[3],data1.shape,data1[100,255,255]) #打印数据name、shape和某一个元素的值


plt.imshow(data1[100,:,:]) # 对第100张slice可视化
plt.show()

Output result:

 
  
['volume-0.nii', 'volume-1.nii', 'volume-10.nii', 'volume-11.nii',... 
volume-11.nii (466, 512, 512) 232.0

It indicates that the shape of the data is (466,512,512), and note that the corresponding order is z, x, y. z is actually the index of the slice. x and y are the width and height of a certain slice.

Plot results with z-index 100:

a3ab00c6ef0b6797e012efd9b1807e52.jpeg

The same slice is visualized in ITK-SNAP (note here (x,y,z=(256,256,101)), because itk-snap starts indexing from 1 by default):

4a668d6c83fabee2778a89fd345dadeb.jpeg

It can be found that the upper and lower x-axis are the same but the y-axis direction is flipped up and down. This is due to the different display methods of matplotlib, but there will be no problem of misalignment of the read data.

For the processing of dicom and mhd, you can refer to this blog post:

Tan Qingbo: Common medical scan image processing steps:

https://zhuanlan.zhihu.com/p/52054982


2. Medical image preprocessing

This part is rather messy. Because of different tasks and different data sets, usually the methods of data preprocessing are very different. But the basic idea is to make the processed data more conducive to network training. Then some methods of two-dimensional image preprocessing can be used for reference, such as contrast enhancement, denoising, cropping and so on. In addition, some prior knowledge of the medical image itself can also be used. For example, different affine doses (unit: HU) in CT images correspond to different tissues and organs of the human body.

49911b7cc9032084e59e11db60ba9921.jpeg

Tissues and organs corresponding to different radiation doses

Based on the above table, the original data can be normalized :

 
  
MIN_BOUND = -1000.0
MAX_BOUND = 400.0


def norm_img(image): # 归一化像素值到(0,1)之间,且将溢出值取边界值
    image = (image - MIN_BOUND) / (MAX_BOUND - MIN_BOUND)
    image[image > 1] = 1.
    image[image < 0] = 0.
    return image

It can also be normalized/zero-meaned to shift the data center to the origin:

 
  
image = image-meam

The above normalization processing is applicable to most data sets, and some other operations are dispensable for specific data. These operations include the above MIN_BOUND and MAX_BOUND. It is best to refer to the open source code processing methods of excellent papers.

It is recommended to save the preprocessed data set locally, which can reduce some resource consumption during training. In addition, data enhancement processing steps such as random cropping and linear transformation still need to be performed during training.

reference:

https://zhuanlan.zhihu.com/p/77791840

https://zhuanlan.zhihu.com/p/104381149

Tan Qingbo: Common medical scan image processing steps

编辑:黄继彦

Guess you like

Origin blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/131297634