How to convert PNG and JPG to Dicom (dcm), the pitfalls I stepped on in those years (Python version)

        As a common data format for medical imaging, Dicom is a pitfall that every student who is deeply involved in medical AI cannot skip. Although I am just a novice rooted in algorithm deployment. But exposure to this type of data is inevitable. No, I recently received an algorithm from an algorithm classmate, and I needed to find a public data set for testing. However, the Dicom data set is not common (PS: 1,000 images are not enough, I am speechless), so we can only focus on PNG and JPG type data sets (except those directly trained with PNG and JPG).

        But it is not easy to convert PNG and JPG data to Dicom. If you are not careful, you will get "non-standard Dicom". I also tried some tutorials on the Internet. Unfortunately, the converted Dicom is either black or unrecognizable. Either it is written in C++ and has to be compiled and compiled again and again, which is annoying. I also tried using ready-made Dicom data and then replacing the Pixel Data with PNG or JPG Data. But all in vain!

        Since then, I have devoted myself to studying (copying the east and copying the west) and completed this python version of converting PNG and JPG to Dicom.

Table of contents

1. Introduction to Dicom data format

2. PNG, JPG to Dicom (take PNG as an example)

3. Further improve Dicom 

4.Result display


1. Introduction to Dicom data format

        First of all, before you try to convert the PNG and JPG types to DICOM data, you may need to understand the basic format of DICOM data.

(1) Preamble: Not important, mainly a few bytes reserved for backward compatibility and scalability.

(2) prefix: Not important, the main thing is to confirm whether the file complies with the DICOM standard. The preamble and prefix are optional and not required for DICOM files.

(3) File Meta Information (file meta information header):Important! ! ! The file metainformation header is a necessary part of the DICOM file, which contains some key information, such as DICOM version number, file byte order, data element encoding method, etc.

(4) DataElements:Important! ! ! is the portion of the DICOM file that contains the actual medical images and related information.

2. PNG, JPG to Dicom (take PNG as an example)

        OK, knowing the data structure of Dicom, we can convert our PNG and JPG for the main parts. Without further ado, let’s get to the code! If you don't want to see the next analysis, you only need to modify the path in the main function.

import os
import pydicom
from PIL import Image

def png_to_dicom(input_png_path, output_dcm_path, patient_name="Anonymous", study_description="PNG to DICOM"):
    for fileNames in os.listdir(input_png_path):
        input_filename = os.path.basename(fileNames).split('.')[0]
        output_filename = input_filename + ".dcm"
        input_filepath = input_png_path + fileNames
        output_dcmpath = output_dcm_path + output_filename

        # 读取PNG图像
        img = Image.open(input_filepath)

        # 将PNG图像转换为灰度图像(单通道)
        pixel_array = img.convert("L")

        # 创建一个空的FileDataset对象,并添加DICOM数据集元素
        ds = pydicom.dataset.FileDataset(output_dcm_path, {}, file_meta=pydicom.dataset.Dataset())  # 创建文件元信息头对象
        # 添加DICOM文件元信息头
        ds.file_meta.FileMetaInformationGroupLength = 184
        ds.file_meta.FileMetaInformationVersion = b'\x00\x01'
        ds.file_meta.MediaStorageSOPClassUID = '1.2.840.10008.5.1.4.1.1.1.1'
        ds.file_meta.MediaStorageSOPInstanceUID = '1.2.410.200048.2858.20230531153328.1.1.1'
        ds.file_meta.TransferSyntaxUID = '1.2.840.10008.1.2'
        ds.file_meta.ImplementationClassUID = '1.2.276.0.7230010.3.0.3.5.4'
        ds.file_meta.ImplementationVersionName = 'ANNET_DCMBK_100'

        # 添加DICOM数据集元素
        ds.PatientName = patient_name
        ds.StudyDescription = study_description
        ds.Columns, ds.Rows = img.size
        ds.SamplesPerPixel = 1
        ds.BitsAllocated = 8
        ds.BitsStored = 8
        ds.HighBit = 7
        ds.PixelRepresentation = 0
        # 数据显示格式
        ds.PhotometricInterpretation = "MONOCHROME2"
        ds.PixelData = pixel_array.tobytes()  # 直接使用灰度图像的字节数据

        # 保存DICOM数据集到文件
        ds.is_little_endian = True
        ds.is_implicit_VR = True  # 使用隐式VR

        ds.save_as(output_dcmpath)
        print(output_dcmpath)


if __name__ == "__main__":
    # 输入PNG图像路径和输出DICOM图像路径
    input_png_path = "Your_Input_PNG_Path"
    output_dcm_path = "Your_Output_Dicom_Path"

    # 将PNG转换为DICOM
    png_to_dicom(input_png_path, output_dcm_path)

Let's analyze this part of the code in detail:

(1) FileMetaInformationGroupLength: Specify the length of the File Meta Information part, it’s optional, just don’t be too outrageous.

(2) FileMetaInformationVersion: Indicates the version number of the File Meta Information part.

(3) MediaStorageSOPClassUID: Defines the data type of the image, each type has a unique UID identification. For example, "1.2.840.10008.5.1.4.1.1.1.1" represents "Digital X-Ray Image Storage - For Presentation"

(4) MediaStorageSOPClassUID: uniquely identifies a specific image data instance.

(5) TransferSyntaxUID: Represents the transmission syntax of DICOM image data, which specifies the encoding method of data in network transmission. Each method has a unique UID identification. For example, "1.2.840.10008.1.2" represents "Implicit VR Little Endian".

(6) ImplementationClassUID: A unique identifier used to identify applications or devices that implement the DICOM standard.

(7) ImplementationVersionName: The version name or identification of an application or device that implements the DICOM standard.

        You may ask, "How do I know what this mess of numbers means?" Being smart, I have already thought of it. First, I randomly select a piece of standard Dicom data, and then execute the following code:

import pydicom
dataset = pydicom.dcmread("Your_Dicom_Path", force=True)
print(dataset.file_meta)

        ​ ​ Then you will see a bunch of information below. If you want to change MediaStorageSOPClassUID and TransferSyntaxUID, then you have to check the corresponding UID yourself, so I do not recommend you change the following content yourself, unless you know what you want to do:

3. Further improve Dicom 

         Hahahaha, I didn’t expect there to be more! In fact, through step 2, you can already obtain a Dicom data format for display. But that's it, if you want to do algorithms or like me, go verify other people's algorithms. Well, this step is essential.

        In step 2, we added File Meta Information (file meta information header) and some DataElements (mainly Pixel Data) to the new Dicom. Therefore, this Dicom can be read and browsed normally. However, if it is used for algorithm training or algorithm verification, it is necessary to ensure the uniqueness of this Dicom data.

        In order to facilitate understanding and ensure the uniqueness of Dicom data, I created a new py file:

import os
import pydicom

# 源文件夹和目标文件夹路径
source_folder = 'Your_Input_Dicom_Path'
target_folder = 'Your_Output_Dicom_Path'

patient_pid = 20230726001
accession_number = 202307261001
study_uid = 2023072620001
seriesNumber = 1
seriesInstanceUID = "1.2.410.200048.2858.20230529094313.1"
modality = "CR"
pixelSpacing = [0.160145, 0.160114]
instanceNumber = 1
bodyPartExamined = "CHEST"

# 遍历源文件夹中的文件
for filename in os.listdir(source_folder):
    if filename.endswith('.dcm'):
        # 构建源文件路径和目标文件路径
        source_file = os.path.join(source_folder, filename)
        target_file = os.path.join(target_folder, filename)

        # 加载源DCM文件
        dcm_data = pydicom.dcmread(source_file, force=True)

        # 添加患者PID、Accession Number和Study UID等信息
        dcm_data.PatientID = str(patient_pid)
        dcm_data.AccessionNumber = str(accession_number)
        dcm_data.StudyInstanceUID = str(study_uid)
        dcm_data.SeriesNumber = seriesNumber
        dcm_data.SeriesInstanceUID = seriesInstanceUID
        dcm_data.Modality = modality
        dcm_data.PixelSpacing = pixelSpacing
        dcm_data.BodyPartExamined = bodyPartExamined
        dcm_data.InstanceNumber = instanceNumber

        # 将文件名作为患者名
        file_name_without_extension = os.path.splitext(filename)[0]
        dcm_data.PatientName = file_name_without_extension

        # 保存修改后的DCM文件到目标文件夹
        dcm_data.save_as(target_file)

        # 递增计数器
        patient_pid += 1
        accession_number += 1
        study_uid += 1
    else:
        print("error!")

Similarly, let's analyze the following part of the code in detail.

(1) patient_pid: The unique identifier of the patient, write it however you like.

(2) accession_numbe: A unique identification number assigned to a patient's examination, uniquely identifying a specific examination or a set of medical images. Write how you like.

(3) study_uid: The ID corresponding to the medical imaging research, write it however you like.

(4) seriesNumber: Identifies the number of the series to which the image belongs. It is recommended to follow my approach.

(5) seriesInstanceUID: uniquely identifies an image series. It is recommended to follow my method, or you can find a standard Dicom and refer to how to write it.

(6) Modality: The imaging modality used to obtain images. It is recommended to follow my method, or you can find a standard Dicom and refer to how it is written.

(7) PixelSpacing: The physical spacing of pixels in the row and column directions. It is recommended to follow my method, or you can find a standard Dicom and refer to how to write it.

(8) instanceNumber: A unique number assigned to an individual instance in an image. It is usually used to distinguish different images in a series. Write how you like.

(9) bodyPartExamined: Check the part, write it according to the actual situation, you may not write it.

4.Result display

        What needs to be said is that certain losses will inevitably occur during the conversion of PNG, JPG or other types of data into Dicom. If there is sufficient Dicom data, it is recommended to use Dicom whether for algorithm training or verification (except for direct training with PNG). Converting PNG or JPG to Dicom is really a helpless move!

Guess you like

Origin blog.csdn.net/m0_46303486/article/details/131938281