Machine Learning Notes - pytorch + unet + Data Science Bowl Competition Medical Image Segmentation

An overview of the dataset

1. The Data Science Bowl

        The dataset is from the 2018 Data Science Bowl competition on the Kaggle website. Data Science Bowl Hosted by Booz Allen and Kaggle, the Data Science Bowl is the world's premier competition for data science for social good.

        The Data Science Bowl brings together data scientists, technologists, domain experts and organizations to meet the world's data and technology challenges. It's a platform through which people can harness their passions, unleash their curiosity, and expand their reach to bring about change on a global scale.

        To showcase the competition, Booz Allen partnered with Kaggle, the leading online data science competition community with over 1 million members worldwide. Over a 90-day period, participants, either individually or in teams, have access to unique datasets to develop algorithms for specific challenges. Each year, the competition awards cash prizes to top teams.

        In 2015, participants examined more than 100,000 underwater images provided by the Hatfield Marine Science Center to assess ocean health at tremendous speed and scale. Over 1,000 teams participated, submitting more than 17,000 solutions to the challenge. The winning team, Deep Ocean, developed a classification algorithm that outperformed current state-of-the-art algorithms by more than 10 percent, achieving human-level performance in some categories.

        In 2016, they applied the analysis to cardiology, changing the practice of assessing heart function. Although the challenge was clearly more complex than the previous year, the competition received nearly 9,300 entries from more than 1,100 teams. In fact, the winning team, Tencia Lee and Qi Liu, were hedge fund traders, not traditional data scientists. The NIH is further studying the results and sharing successful approaches with the medical and research communities.

        In 2017, nearly 10,000 participants worked to improve lung cancer screening techniques, submitting more than 18,000 algorithms. Preliminary results show a 10% reduction in false positives, while improving accuracy by 10% over state-of-the-art. A follow-up competition, sponsored by the Bonnie J. Addario Lung Cancer Foundation and DrivenData.org, is underway to take the 2017 Data Science Bowl algorithm advances from concept to clinic. 

        The website of the 2018 competition, where the dataset can be downloaded.

2018 Data Science Bowl | KaggleFind the nuclei in divergent images to advance medical discoveryhttps://www.kaggle.com/competitions/data-science-bowl-2018/overview

2. Data set view

        After the data is downloaded and decompressed, you will see the following files. Here we only focus on stage1_train.zip and stage1_test.zip temporarily.

         One of the sample images.

sample

         A screenshot of the part of the mask corresponding to the sample image.

marked mask

2. Reference code

        Reference code download

https://github.com/4uiiurz1/pytorch-nested-uneticon-default.png?t=M276https://github.com/4uiiurz1/pytorch-nested-unet

1. Code structure

        The usefulness of the file can basically be seen from the name of the py file.

title

 2. Download the dataset to input/ and unzip it

 3. Image preprocessing

        Execute the script for image preprocessing, mainly for merging of masks and so on.

python preprocess_dsb2018.py

         After the processing is completed, the following image and mask will be obtained.

title

 4. to train

python train.py --dataset dsb2018_96 --arch NestedUNet

        After the training is completed, the trained model will be generated in the models folder. 

 5. Verify

        Execute the following script for verification, and the results will be output in the outputs folder.

python val.py --name dsb2018_96_NestedUNet_woDS

        There is still some ambiguity because there are not enough training epochs.

3. Transfer to ONNX

        Call pth_2onnx.py to convert the onnx model.

def pth_2onnx():
    """
    pytorch 模型转换为onnx模型
    :return:
    """
    torch_model = torch.load('./models/dsb2018_96_NestedUNet_woDS/model.pth')

    config = vars(parse_args())
    model = archs.__dict__[config['arch']](config['num_classes'],
                                           config['input_channels'],
                                           config['deep_supervision'])
    model.load_state_dict(torch_model)
    batch_size = 1  # 批处理大小
    input_shape = (3, 96, 96)  # 输入数据

    # set the model to inference mode
    model.eval()
    print(model)
    x = torch.randn(batch_size, *input_shape)  # 生成张量
    export_onnx_file = "model.onnx"  # 目的ONNX文件名
    torch.onnx.export(model,
                      x,
                      export_onnx_file,
                      # 注意这个地方版本选择为11
                      opset_version=11)

        Call onnx model for inference

ort_session = ort.InferenceSession('model.onnx')
input_name = ort_session.get_inputs()[0].name

img = cv2.imread('ba0c9e776404370429e80.png')  # 02_test.tif')#demo.png
#img = cv2.resize(img, (96, 96))

nor = alb.Normalize()
img = nor.apply(image=img)
img = img.astype('float32') / 255
#img = img.transpose(2, 1, 0)
img = cv2.resize(img, (96, 96))

tensor = transforms.ToTensor()(img)
tensor = tensor.unsqueeze_(0)

ort_outs = ort_session.run(None, {input_name: tensor.cpu().numpy()})

img_out = ort_outs[0]
img_out = torch.from_numpy(img_out)
img_out = torch.sigmoid(img_out).cpu().numpy()

cv2.imwrite(os.path.join('result.png'), (img_out[0][0] * 255).astype('uint8'))

Guess you like

Origin blog.csdn.net/bashendixie5/article/details/123282703