[3D Image Segmentation] 3D Image Segmentation 6 based on Pytorch (data preprocessing LIDC-IDRI tag xml tag dump and tag count statistics)

Since the LUNA16 data processing method compiled by the previous Bizhan author was too cumbersome, this article will make a new arrangement of the LUNA16 data. Finally, The data and form obtained are almost the same. However, the main difference is that the code logic is relatively simple and easy to understand.

For the study ofLUNA16 data set, you can refer here:[3D Image Classification] 3D Stereo Image Classification 3 (LIDC- IDRI Pulmonary Nodule XML Feature Tag PKL Dump)

The main steps and central content of this article include the following parts:

masksGeneration: Extract the nodule marker position coordinates of the corresponding sequence from the xml file (a nodule may be marked multiple times by multiple people) , generate the correspondingarray file, the size is consistent with the image array size;seriesmask
Lung parenchyma extraction operation: From the lung area segmentation data, perform a product operation with the original image and mask image, and fill or remove the non-lung area parts; < /span>
resampleOperation: According tospacing, perform resample operation, which can be performed in zyx three dimensions< a i=4>, or you can only perform the operation bit in the direction (I saw something similar to this in the paper do so);resamplezresample1mm
According tomask, obtain the center point coordinates and radius of the nodulezyx.

At this point, we will have the following files:

contains the image data of ct;
correspondingmaskdata;
Record zyx file of center point coordinates and radius.

Compared with the data format given byluna16, the current data is easier to understand and easier to view. Whether it is visualization or subsequent data processing and training, it is more intuitive and clear. This part will be expanded on one by one later.

Because the amount of code is still relatively large, there are many things to deal with, and there are many files involved, so it may be expanded into several chapters. In this article, we will first process the xml文 file and transfer it out for easy viewing. This involvesxml file format and processing, so I will write a separate article, link for reference:[Medical Imaging Data Processing] Summary of XML file format processing< /span>

1. xml file dump

1.1. Understand the annotation file xml

For an introduction to what each field in the LIDC-IDRI data set xml means, you can refer to my other article, click here: < /span>[LIDC-IDRI] CT pulmonary nodule XML tag feature benign and malignant tag PKL dump (1)

In this article, we focus on the structure of this data and xmlthe tag meaning of each record. I believe that after reading this, you will have a deeper understanding of the processing of this data set.

Most of the code is the same as the content introduced and obtained from the link above. You can refer to this GitHub:NoduleNet - utils -LIDC

Some content has not been introduced, so I will simply make a supplement.

ResponseHeader: This is the header part, which records the information of this case (that is, the CT image of a single patient).

For the convenience of viewing and learningxml files, you can refer to this article:[Medical Imaging Data Processing] Summary of XML file format processing a>. We will use the form of xml into a dictionary to facilitate our viewing. The following shows the comparison of the before and after conversion to the dictionary, as follows:

The data form of originalxml is excerpted from it and is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<LidcReadMessage uid="1.3.6.1.4.1.14519.5.2.1.6279.6001.1308168927505.0" xmlns="http://www.nih.gov" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd">
  <ResponseHeader>
    <Version>1.7</Version>
    <MessageId>1148851</MessageId>
    <DateRequest>2005-11-03</DateRequest>
    <TimeRequest>12:25:10</TimeRequest>
    <RequestingSite>removed</RequestingSite>
    <ServicingSite>removed</ServicingSite>
    <TaskDescription>Second unblinded read</TaskDescription>
    <CtImageFile>removed</CtImageFile>
    <SeriesInstanceUid>1.3.6.1.4.1.14519.5.2.1.6279.6001.131939324905446238286154504249</SeriesInstanceUid>
    <StudyInstanceUID>1.3.6.1.4.1.14519.5.2.1.6279.6001.303241414168367763244410429787</StudyInstanceUID>
    <DateService>2005-11-03</DateService>
    <TimeService>12:25:40</TimeService>
    <ResponseDescription>1 - Reading complete</ResponseDescription>
    <ResponseComments></ResponseComments>
  </ResponseHeader>

is converted into dictionary dictionary form. (Easier to view)

{
  "LidcReadMessage": {
    "@uid": "1.3.6.1.4.1.14519.5.2.1.6279.6001.1308168927505.0",
    "@xmlns": "http://www.nih.gov",
    "@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
    "@xsi:schemaLocation": "http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd",
    "ResponseHeader": {
      "Version": "1.7",
      "MessageId": "1148851",
      "DateRequest": "2005-11-03",
      "TimeRequest": "12:25:10",
      "RequestingSite": "removed",
      "ServicingSite": "removed",
      "TaskDescription": "Second unblinded read",
      "CtImageFile": "removed",
      "SeriesInstanceUid": "1.3.6.1.4.1.14519.5.2.1.6279.6001.131939324905446238286154504249",
      "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6279.6001.303241414168367763244410429787",
      "DateService": "2005-11-03",
      "TimeService": "12:25:40",
      "ResponseDescription": "1 - Reading complete",
      "ResponseComments": null
    },
}

1.2. Convert xml comprehensive records to npy files by series

LIDC-IDRIYes1018个Check, in tagged folders folders, there files. Moreover, the names of these files do not correspond one-to-one with the sequence names of the images. tcia-lidc-xml61318 xmlxml

Therefore, it is necessary to reorganize the information marked in the documentxml and convert it into content that people can easily understand and understand. Moreover, if the annotation file can have a one-to-one correspondence with the image file, subsequent processing will be much easier.

What this section does is to extract the xml files and leave the content you care about, leaving other unimportant and unconcerned content aside for the time being.

Below is the processing code, the main steps are outlined below:

Traverse allxml files and process them one by one;
For a singlexml file, parse outseriesuid and the labeled nodule coordinates;
is stored in a file named named after seriesuid, and the stored content is the coordinates of each nodule. npy

The complete code is as follows:

from tqdm import tqdm
import sys
import os
import numpy as np

from pylung.utils import find_all_files
from pylung.annotation import parse

def xml2mask(xml_file):
    header, annos = parse(xml_file)  # get one xml info

    ctr_arrs = []
    for i, reader in enumerate(annos):
        for j, nodule in enumerate(reader.nodules):
            ctr_arr = []
            for k, roi in enumerate(nodule.rois):
                z = roi.z
                for roi_xy in roi.roi_xy:
                    ctr_arr.append([z, roi_xy[1], roi_xy[0]])  # [[[z, y, x], [z, y, x]]]
            ctr_arrs.append(ctr_arr)

    seriesuid = header.series_instance_uid
    return seriesuid, ctr_arrs

def annotation2masks(annos_dir, save_dir):
    # get all xml file path
    files = find_all_files(annos_dir, '.xml')
    for f in tqdm(files, total=len(files)):
        print(f)
        try:
            seriesuid, masks = xml2mask(f)
            np.save(os.path.join(save_dir, '%s' % (seriesuid)), masks)  # save xml 3D coor [[z, y, x], [z, y, x]]
        except:
            print("Unexpected error:", sys.exc_info()[0])


if __name__ == '__main__':
    annos_dir = './LUNA16/annotation/LIDC-XML-only/tcia-lidc-xml'     # .xml
    ctr_arr_save_dir = './LUNA16/annotation/noduleCoor'  # 保存每个注释器解析的中间结节mask的地方

    os.makedirs(ctr_arr_save_dir, exist_ok=True)

    # xml信息，转储npy（临时文件）
    annotation2masks(annos_dir, ctr_arr_save_dir)

Open a ·npy· file for viewing below. The recorded content is as follows, which are the coordinate points of all the nodules marked by all doctors in this sequence: polygon

[list([[-299.8, 206, 42], [-299.8, 207, 41], [-299.8, 208, 41], [-299.8, 209, 40], [-299.8, 210, 40], [-299.8, 211, 41], [-299.8, 212, 41], [-299.8, 213, 42], [-299.8, 214, 42], [-299.8, 215, 43], [-299.8, 216, 44], [-299.8, 216, 45], [-299.8, 215, 46], [-299.8, 215, 47], [-299.8, 215, 48], [-299.8, 214, 49], [-299.8, 213, 49], [-299.8, 212, 49], [-299.8, 211, 49], [-299.8, 210, 49], [-299.8, 209, 49], [-299.8, 208, 48], [-299.8, 207, 47], [-299.8, 207, 46], [-299.8, 206, 45], [-299.8, 206, 44], [-299.8, 206, 43], [-299.8, 206, 42], [-298.0, 206, 46], [-298.0, 207, 45], [-298.0, 207, 44], [-298.0, 208, 43], [-298.0, 209, 42], [-298.0, 209, 41], [-298.0, 210, 40], [-298.0, 211, 40], [-298.0, 212, 39], [-298.0, 213, 40], [-298.0, 214, 41], [-298.0, 215, 42], [-298.0, 215, 43], [-298.0, 216, 44], [-298.0, 216, 45], [-298.0, 216, 46], [-298.0, 216, 47], [-298.0, 215, 48], [-298.0, 214, 48], [-298.0, 213, 48], [-298.0, 212, 48], [-298.0, 211, 48], [-298.0, 210, 48], [-298.0, 209, 48], [-298.0, 208, 48], [-298.0, 207, 47], [-298.0, 206, 46], [-296.2, 209, 42], [-296.2, 210, 41], [-296.2, 211, 40], [-296.2, 212, 40], [-296.2, 213, 41], [-296.2, 214, 42], [-296.2, 215, 43], [-296.2, 216, 44], [-296.2, 216, 45], [-296.2, 216, 46], [-296.2, 216, 47], [-296.2, 216, 48], [-296.2, 215, 49], [-296.2, 214, 49], [-296.2, 213, 49], [-296.2, 212, 49], [-296.2, 211, 48], [-296.2, 210, 47], [-296.2, 209, 46], [-296.2, 209, 45], [-296.2, 209, 44], [-296.2, 209, 43], [-296.2, 209, 42]])
 list([[-227.8, 151, 405], [-227.8, 152, 404], [-227.8, 153, 403], [-227.8, 154, 402], [-227.8, 155, 402], [-227.8, 156, 402], [-227.8, 157, 403], [-227.8, 157, 404], [-227.8, 157, 405], [-227.8, 158, 406], [-227.8, 158, 407], [-227.8, 158, 408], [-227.8, 157, 409], [-227.8, 156, 409], [-227.8, 155, 409], [-227.8, 154, 408], [-227.8, 153, 408], [-227.8, 152, 407], [-227.8, 151, 406], [-227.8, 151, 405], [-226.0, 152, 405], [-226.0, 153, 404], [-226.0, 154, 404], [-226.0, 155, 403], [-226.0, 156, 404], [-226.0, 157, 405], [-226.0, 157, 406], [-226.0, 157, 407], [-226.0, 156, 408], [-226.0, 155, 408], [-226.0, 154, 408], [-226.0, 153, 408], [-226.0, 152, 407], [-226.0, 152, 406], [-226.0, 152, 405]])
 list([[-226.0, 158, 407], [-226.0, 157, 408], [-226.0, 156, 409], [-226.0, 155, 409], [-226.0, 154, 409], [-226.0, 153, 409], [-226.0, 152, 408], [-226.0, 151, 407], [-226.0, 152, 406], [-226.0, 153, 405], [-226.0, 153, 404], [-226.0, 154, 403], [-226.0, 155, 402], [-226.0, 156, 402], [-226.0, 157, 403], [-226.0, 158, 404], [-226.0, 158, 405], [-226.0, 158, 406], [-226.0, 158, 407], [-227.8, 159, 407], [-227.8, 158, 408], [-227.8, 157, 409], [-227.8, 156, 410], [-227.8, 155, 410], [-227.8, 154, 410], [-227.8, 153, 409], [-227.8, 152, 408], [-227.8, 151, 407], [-227.8, 151, 406], [-227.8, 151, 405], [-227.8, 152, 404], [-227.8, 153, 403], [-227.8, 154, 402], [-227.8, 155, 402], [-227.8, 156, 402], [-227.8, 157, 403], [-227.8, 158, 404], [-227.8, 158, 405], [-227.8, 158, 406], [-227.8, 159, 407]])
 list([[-296.2, 214, 46], [-296.2, 213, 47], [-296.2, 212, 47], [-296.2, 211, 47], [-296.2, 210, 46], [-296.2, 209, 45], [-296.2, 208, 44], [-296.2, 208, 43], [-296.2, 208, 42], [-296.2, 209, 41], [-296.2, 210, 42], [-296.2, 211, 42], [-296.2, 212, 43], [-296.2, 213, 44], [-296.2, 214, 45], [-296.2, 214, 46], [-298.0, 216, 47], [-298.0, 215, 48], [-298.0, 214, 49], [-298.0, 213, 49], [-298.0, 212, 49], [-298.0, 211, 49], [-298.0, 210, 49], [-298.0, 209, 48], [-298.0, 208, 47], [-298.0, 207, 46], [-298.0, 207, 45], [-298.0, 207, 44], [-298.0, 208, 43], [-298.0, 208, 42], [-298.0, 209, 41], [-298.0, 210, 41], [-298.0, 211, 41], [-298.0, 212, 41], [-298.0, 213, 41], [-298.0, 214, 42], [-298.0, 215, 43], [-298.0, 216, 44], [-298.0, 216, 45], [-298.0, 216, 46], [-298.0, 216, 47], [-299.8, 216, 50], [-299.8, 215, 51], [-299.8, 214, 51], [-299.8, 213, 50], [-299.8, 212, 50], [-299.8, 211, 50], [-299.8, 210, 49], [-299.8, 209, 48], [-299.8, 208, 47], [-299.8, 207, 46], [-299.8, 207, 45], [-299.8, 207, 44], [-299.8, 208, 43], [-299.8, 209, 42], [-299.8, 210, 42], [-299.8, 211, 41], [-299.8, 212, 41], [-299.8, 213, 42], [-299.8, 214, 42], [-299.8, 215, 43], [-299.8, 216, 44], [-299.8, 216, 45], [-299.8, 216, 46], [-299.8, 216, 47], [-299.8, 216, 48], [-299.8, 216, 49], [-299.8, 216, 50]])
 list([[-226.0, 158, 407], [-226.0, 157, 408], [-226.0, 156, 409], [-226.0, 155, 409], [-226.0, 154, 409], [-226.0, 153, 409], [-226.0, 152, 409], [-226.0, 151, 409], [-226.0, 151, 408], [-226.0, 151, 407], [-226.0, 151, 406], [-226.0, 151, 405], [-226.0, 152, 404], [-226.0, 152, 403], [-226.0, 153, 403], [-226.0, 154, 402], [-226.0, 154, 401], [-226.0, 155, 401], [-226.0, 156, 401], [-226.0, 157, 401], [-226.0, 157, 402], [-226.0, 158, 403], [-226.0, 158, 404], [-226.0, 158, 405], [-226.0, 158, 406], [-226.0, 158, 407], [-227.8, 159, 407], [-227.8, 158, 408], [-227.8, 158, 409], [-227.8, 157, 409], [-227.8, 156, 410], [-227.8, 155, 410], [-227.8, 154, 409], [-227.8, 153, 409], [-227.8, 152, 409], [-227.8, 151, 408], [-227.8, 151, 407], [-227.8, 151, 406], [-227.8, 151, 405], [-227.8, 151, 404], [-227.8, 152, 403], [-227.8, 152, 402], [-227.8, 153, 401], [-227.8, 154, 401], [-227.8, 155, 401], [-227.8, 156, 401], [-227.8, 157, 401], [-227.8, 158, 402], [-227.8, 158, 403], [-227.8, 159, 404], [-227.8, 159, 405], [-227.8, 159, 406], [-227.8, 159, 407]])
 list([[-296.2, 215, 47], [-296.2, 214, 48], [-296.2, 213, 48], [-296.2, 212, 48], [-296.2, 211, 48], [-296.2, 210, 47], [-296.2, 209, 47], [-296.2, 208, 46], [-296.2, 208, 45], [-296.2, 207, 44], [-296.2, 207, 43], [-296.2, 208, 42], [-296.2, 209, 42], [-296.2, 210, 42], [-296.2, 211, 42], [-296.2, 212, 43], [-296.2, 213, 43], [-296.2, 214, 44], [-296.2, 215, 45], [-296.2, 215, 46], [-296.2, 215, 47], [-298.0, 216, 47], [-298.0, 215, 48], [-298.0, 214, 49], [-298.0, 214, 50], [-298.0, 213, 50], [-298.0, 212, 50], [-298.0, 211, 49], [-298.0, 210, 49], [-298.0, 209, 48], [-298.0, 208, 48], [-298.0, 207, 47], [-298.0, 207, 46], [-298.0, 207, 45], [-298.0, 207, 44], [-298.0, 207, 43], [-298.0, 207, 42], [-298.0, 207, 41], [-298.0, 208, 41], [-298.0, 209, 41], [-298.0, 210, 41], [-298.0, 211, 41], [-298.0, 212, 41], [-298.0, 213, 41], [-298.0, 214, 41], [-298.0, 215, 42], [-298.0, 215, 43], [-298.0, 216, 44], [-298.0, 216, 45], [-298.0, 216, 46], [-298.0, 216, 47], [-299.8, 217, 46], [-299.8, 216, 47], [-299.8, 216, 48], [-299.8, 215, 49], [-299.8, 214, 50], [-299.8, 213, 50], [-299.8, 212, 50], [-299.8, 211, 50], [-299.8, 210, 50], [-299.8, 209, 49], [-299.8, 208, 48], [-299.8, 208, 47], [-299.8, 207, 46], [-299.8, 207, 45], [-299.8, 207, 44], [-299.8, 208, 43], [-299.8, 209, 42], [-299.8, 209, 41], [-299.8, 210, 41], [-299.8, 211, 41], [-299.8, 212, 41], [-299.8, 213, 41], [-299.8, 214, 42], [-299.8, 215, 42], [-299.8, 215, 43], [-299.8, 216, 44], [-299.8, 217, 45], [-299.8, 217, 46], [-301.6, 214, 45], [-301.6, 213, 46], [-301.6, 212, 47], [-301.6, 211, 47], [-301.6, 210, 46], [-301.6, 209, 45], [-301.6, 210, 44], [-301.6, 211, 43], [-301.6, 212, 43], [-301.6, 213, 44], [-301.6, 214, 45]])
 list([[-296.2, 209, 43], [-296.2, 209, 44], [-296.2, 210, 45], [-296.2, 211, 46], [-296.2, 212, 47], [-296.2, 212, 48], [-296.2, 213, 48], [-296.2, 214, 48], [-296.2, 215, 47], [-296.2, 215, 46], [-296.2, 215, 45], [-296.2, 214, 44], [-296.2, 213, 43], [-296.2, 212, 43], [-296.2, 211, 43], [-296.2, 210, 43], [-296.2, 209, 43], [-298.0, 208, 42], [-298.0, 208, 43], [-298.0, 208, 44], [-298.0, 208, 45], [-298.0, 208, 46], [-298.0, 208, 47], [-298.0, 209, 47], [-298.0, 210, 48], [-298.0, 211, 48], [-298.0, 211, 49], [-298.0, 212, 49], [-298.0, 213, 48], [-298.0, 214, 48], [-298.0, 215, 47], [-298.0, 216, 46], [-298.0, 216, 45], [-298.0, 216, 44], [-298.0, 215, 43], [-298.0, 214, 43], [-298.0, 213, 42], [-298.0, 212, 42], [-298.0, 212, 41], [-298.0, 211, 41], [-298.0, 210, 41], [-298.0, 209, 42], [-298.0, 208, 42], [-299.8, 210, 43], [-299.8, 209, 43], [-299.8, 208, 44], [-299.8, 207, 44], [-299.8, 207, 45], [-299.8, 207, 46], [-299.8, 208, 47], [-299.8, 209, 48], [-299.8, 210, 49], [-299.8, 211, 49], [-299.8, 212, 49], [-299.8, 213, 50], [-299.8, 214, 49], [-299.8, 215, 48], [-299.8, 215, 47], [-299.8, 216, 46], [-299.8, 216, 45], [-299.8, 215, 44], [-299.8, 215, 43], [-299.8, 214, 43], [-299.8, 214, 42], [-299.8, 213, 42], [-299.8, 212, 41], [-299.8, 211, 41], [-299.8, 210, 42], [-299.8, 210, 43]])] <class 'numpy.ndarray'>

2. Marking times and mask array generation

The generatednpy file is not the final result of this annotation information, for the following reasons:

xmlThe nodule coordinates marked in the file were marked separately by multiple doctors, so there will be overlap in marking (that is, a nodule was marked repeatedly by multiple doctors, many of them back-to-back, and it is not known what other doctors marked). Therefore, it is necessary to process the content marked by multiple people and leave the final nodule coordinates;
is just a coordinate point, and it is necessary to generate the same and corresponding files as image. shapemask

Based on the above reasons, to generate the finalmask file, you need to go through the following steps:

The nodule coordinate points marked by need to be processed from hu z to instanceNum on the corresponding image;
Process the nodules marked by multiple doctors, and leave the final nodule according toiouoverlapping rules;
The nodule coordinates left by are drawn on mask and stored.

The implementation code is as follows:

import nrrd
import SimpleITK as sitk
import cv2
import os
import numpy as np

def load_itk_image(filename):
    """
    Return img array and [z,y,x]-ordered origin and spacing
    """
    # sitk.ReadImage返回的image的shape是x、y、z
    itkimage = sitk.ReadImage(filename)
    numpyImage = sitk.GetArrayFromImage(itkimage)

    numpyOrigin = np.array(list(reversed(itkimage.GetOrigin())))
    numpySpacing = np.array(list(reversed(itkimage.GetSpacing())))

    return numpyImage, numpyOrigin, numpySpacing


def arrs2mask(img_dir, ctr_arr_dir, save_dir):
    cnt = 0
    consensus = {
    
    1: 0, 2: 0, 3: 0, 4: 0}  # 一致意见

    # generate save document
    for k in consensus.keys():
        if not os.path.exists(os.path.join(save_dir, str(k))):
            os.makedirs(os.path.join(save_dir, str(k)))

    for f in os.listdir(img_dir):
        if f.endswith('.mhd'):
            pid = f[:-4]
            print('pid:', pid)
            # ct
            img, origin, spacing = load_itk_image(os.path.join(img_dir, '%s.mhd' % (pid)))

            # mask coor npy
            ctr_arrs = np.load(os.path.join(ctr_arr_dir, '%s.npy' % (pid)), allow_pickle=True)
            cnt += len(ctr_arrs)

            nodule_masks = []
            # 依次标注结节处理
            for ctr_arr in ctr_arrs:
                z_origin = origin[0]
                z_spacing = spacing[0]

                ctr_arr = np.array(ctr_arr)
                # ctr_arr[:, 0] z轴方向值，由hu z到instanceNum  [-50, -40, -30]-->[2, 3, 4]
                ctr_arr[:, 0] = np.absolute(ctr_arr[:, 0] - z_origin) / z_spacing  # 对数组中的每一个元素求其绝对值。np.abs是这个函数的简写
                ctr_arr = ctr_arr.astype(np.int32)
                print(ctr_arr)

                # 每一个标注的结节，都会新临时生成一个与img一样大小的mask文件
                mask = np.zeros(img.shape)
                # 遍历标注层的 z 轴序列
                for z in np.unique(ctr_arr[:, 0]):  # 去除其中重复的元素 ，并按元素 由小到大排序
                    ctr = ctr_arr[ctr_arr[:, 0] == z][:, [2, 1]]
                    ctr = np.array([ctr], dtype=np.int32)
                    mask[z] = cv2.fillPoly(mask[z], ctr, color=(1,))
                nodule_masks.append(mask)

            i = 0
            visited = []
            d = {
    
    }
            masks = []
            while i < len(nodule_masks):
                # If mached before, then no need to create new mask
                if i in visited:
                    i += 1
                    continue
                same_nodules = []
                mask1 = nodule_masks[i]
                same_nodules.append(mask1)
                d[i] = {
    
    }
                d[i]['count'] = 1
                d[i]['iou'] = []

                # Find annotations pointing to the same nodule
                # 当前结节mask[i]，与其后面的所有结节，依次求iou
                for j in range(i + 1, len(nodule_masks)):
                    # if not overlapped with previous added nodules
                    if j in visited:
                        continue
                    mask2 = nodule_masks[j]
                    iou = float(np.logical_and(mask1, mask2).sum()) / np.logical_or(mask1, mask2).sum()

                    # 如果iou超过阈值，则当前第i个mask记为被重复标记一次
                    if iou > 0.4:
                        visited.append(j)
                        same_nodules.append(mask2)
                        d[i]['count'] += 1
                        d[i]['iou'].append(iou)

                masks.append(same_nodules)
                i += 1

            print(visited)
            exit()
            # only 4 people, check up 4 data
            for k, v in d.items():
                if v['count'] > 4:
                    print('WARNING:  %s: %dth nodule, iou: %s' % (pid, k, str(v['iou'])))
                    v['count'] = 4
                consensus[v['count']] += 1

            # number of consensus
            num = np.array([len(m) for m in masks])
            num[num > 4] = 4  # 最多4次，超过4次重复标记的，记为4次

            if len(num) == 0:
                continue
            # Iterate from the nodules with most consensus
            for n in range(num.max(), 0, -1):
                mask = np.zeros(img.shape, dtype=np.uint8)

                for i, index in enumerate(np.where(num >= n)[0]):
                    same_nodules = masks[index]
                    m = np.logical_or.reduce(same_nodules)
                    mask[m] = i + 1  # 区分不同的结节，不同的结节给与不同的数值，依次增加（如果是分割，可以直接都给1，或者最后统一处理为1也可以）
                nrrd.write(os.path.join(save_dir, str(n), pid+'.nrrd'), mask)  # mask

    print(consensus)
    print(cnt)

if __name__ == '__main__':
    img_dir = r'./LUNA16/image_combined'        # data

    ctr_arr_save_dir = r'./LUNA16/annotation/noduleCoor'  # 保存每个注释器解析的中间结节mask的地方
    noduleMask_save_dir = r'./LUNA16/nodule_masks'  # 保存合并结节掩码的文件夹

    # 对转储的临时文件，生成mask
    arrs2mask(img_dir, ctr_arr_save_dir, noduleMask_save_dir)

At this point, which is the same as is generated. Use to open and view the processed results, as shown below: imageshapemaskitk-snap

Insert image description here
belongs to the images that were opened image and mask respectively, in the format , transfer , you can refer to the following code: nrrdmhdimagenrrd

nii_path = os.path.join(r'./LUNA16/image_combined', '1.3.6.1.4.1.14519.5.2.1.6279.6001.184412674007117333405073397832.mhd')
image = itk.array_from_image(itk.imread(nii_path))

nrrd.write(r'./image.nrrd', image)

3. Summary

lidc-idriThe data formats in the data set of are data forms that we don’t often encounter, especially the files of mhd files, which also represent Two different parts of one data are also rarely encountered. raw

But for beginners, understanding this data form is still a bit unfamiliar. I believe this part can be understood through this series. At the same time, this article is also stored as a nrrd file. This is my preferred array storage format. It is easy to understand and simple to understand.

At this point, you have gained a new one-to-one correspondence. This will be much easier to understand than reading the xml file. In the next section, we will combine the initially obtained image and mask with the lung segmentation to further refine it. resampleOperation, adjust the data to a unified scale.