Article directory
1. Introduction to medical imaging
Medical Imaging is the study of the interaction with the human body by means of a certain medium (such as X-rays, electromagnetic fields, ultrasound, etc.), and displays the structure and density of internal tissues and organs in the form of images for diagnosticians to provide information based on the images. It is a science that evaluates human health status by making judgments, including two relatively independent research directions of medical imaging system and medical image processing.
Instruments mainly include X-ray imaging equipment, CT (ordinary CT, spiral CT), positron emission tomography (PET), ultrasound (sub-B ultrasound, color Doppler ultrasound, cardiac color Doppler ultrasound, three-dimensional color Doppler ultrasound), magnetic resonance imaging (MRI), electrocardiogram equipment, EEG equipment, etc.
2. Introduction to DICOM files
DICOM (Digital Imaging and Communications in Medicine) is an international standard (ISO 12052) for medical images and related information. DICOM is widely used in medical radiology, cardiovascular imaging, and radiological diagnostic equipment (X-ray, CT, nuclear magnetic resonance, ultrasound, etc.), and is increasingly used in other medical fields such as ophthalmology and dentistry. All patient medical images are stored in DICOM file format. This format contains PHI (protected health information) information about the patient, such as name, gender, age, and other image-related information such as information about the device that captured and generated the image, and some medical context-related information. Medical imaging equipment generates DICOM files that doctors use DICOM readers (computer software capable of displaying DICOM images) to read and diagnose problems found in the images.
The currently adopted standard is DICOM3.0, and each image carries a large amount of information, which can be divided into the following four categories: (a) Patient (b) Study (c) Series (d) Image . Each DICOM Tag is determined by a combination of two hexadecimal numbers, namely Group and Element. For example, the Tag (0010,0010) represents Patient's Name, which stores the name of the patient in this DICOM image.
C++-based DCMTK, Java-based dcm4che, and python-based pydicom are all excellent third-party libraries for interpreting the DICOM standard. By introducing them into the project, software developers can avoid the underlying analysis work, which can be used for project development. Improve efficiency.
At present, such as CT, nuclear magnetic resonance, ultrasound, etc., use precisely collimated X-ray beams, γ-rays, ultrasound, etc., together with highly sensitive detectors to scan a certain part of the human body one by one, so the obtained results after scanning The image is a multi-layer image, and we can form a three-dimensional image by stacking layers of images on the z-axis (this involves the problem of three-dimensional reconstruction). At this time, we can store dicom images in each layer In the file (of course, the dicom file is not just pixel information, it also has a lot of data header information), as shown in the figure below, our purpose is to read these data header information and pixel information from a series of dicom files come out.
A DICOM file refers to a medical file stored in accordance with the DICOM standard. It generally consists of a DICOM file header and a DICOM data set
. A file header:
- The preamble of the file consists of 128 bytes;
- DICOM prefix, you can judge whether the file is a DICOM file according to whether the 4-byte string is equal to "DICM";
- document information element
3. Detailed explanation of DICOM internal information (DICOM Tag and VR)
A DICOM dataset is the main component of a DICOM file, which consists of DICOM data elements arranged in a specified order. The most basic data unit of a data element is a data element, which is arranged in ascending order of TAG, that is, a data element represents a TAG. The data element mainly consists of 4 parts:
- TAG number : consists of 4 bytes, including a 2-byte group number and a 2-byte element number (for example: 0010 0040 indicates the gender of the patient, and the group number: 0002 describes the device communication information, 0008 describes the characteristic parameters, 0010 Describe patient information, 0028 describe image information parameters ). When the relevant data of the DICOM file is needed, it is obtained according to the TAG.
- Value representation (VR, value representation) : It consists of two bytes of characters, storing the data type that describes the metadata information of the item, including for example: LO (Long String, long string), IS (Interger String, integer character String), DA (data, date) and so on a total of 27 data types.
- Value length (value length) : store the length of the data describing the information.
- Value field (value) : stores the data value describing the information of this item.
The data metadata information can be divided into 4 categories according to different information: - Patient - Study - Series - Image -. It can be understood that a patient ( patient ) can have multiple examinations ( study ), and one examination includes multiple examination sites ( series ), and each examination site has one or more corresponding image images ( image ).
1. Common tags
(1) Patient Tag
Group | Element | Tag Description | Chinese explanation | Data Type (VR) |
---|---|---|---|---|
0010 | 0010 | Patient’s Name | patient name | PN |
0010 | 0020 | Patient ID | patient ID | LO |
0010 | 0030 | Patient’s Birth Date | Patient's date of birth | AND |
0010 | 0032 | Patient’s Birth Time | Patient's time of birth | TM |
0010 | 0040 | Patient’s Sex | patient gender | CS |
0010 | 1030 | Patient’s Weight | patient weight | DS |
0010 | 21C0 | Pregnancy Status | pregnancy status | US |
(2) Study Tag
Group | Element |
|
|
Data Type (VR) |
---|---|---|---|---|
0008 | 0050 | Accession Number: A RIS generated number that identifies order for the Study. |
Inspection number: The generation sequence number of RIS, used to identify the order of inspection |
SH |
0020 | 0010 | Study ID | check ID | SH |
0020 | 000D | Study Instance UID: Unique identifier for the Study. |
Exam instance number: Unique identification number for different exams |
UI |
0008 | 0020 | Study Date: Date the Study started. |
Inspection Date: The date the inspection started |
AND |
0008 | 0030 | Study Time: Time the Study started. |
Check Time: The time the check started |
TM |
0008 | 0061 | Modalities in Study | Different types of exams included in an exam | CS |
0008 | 0015 | Body Part Examined | check site | CS |
0008 | 1030 | Study Description | check description | LO |
0010 | 1010 | Patient’s Age | The age of the patient at the time of the examination, not the actual age of the patient at the moment | AS |
(3) Series Tag
Group | Element |
|
|
Data Type (VR) |
---|---|---|---|---|
0020 | 0011 | Series Number: A number that identifies this Series. |
Serial number: a number identifying the different checks |
IS |
0020 | 000E | Series Instance UID: Unique identifier for the Series. |
Sequence instance number: unique identification number for different sequences |
UI |
0008 | 0060 | Modality | Check modality (MRI/CT/CR/DR) | CS |
0008 | 103E | Series Description | Check description and instructions | LO |
0008 | 0021 | Series Date | check date | AND |
0008 | 0031 | Series Time | check the time | TM |
0020 | 0032 | Image Position (Patient): The x,y and z coordinates of the upper left hand corner of the image,in mm. |
Image position: the xyz coordinates of the upper left corner of the image in the space coordinate system, the unit is mm. If in inspection, the coordinates of the upper left corner of the first image in the sequence. |
DS |
0020 | 0037 | Image Orientation (Patient): The direction cosines of the first row and the first column with respect to the patient. |
image orientation | DS |
0018 | 0050 | Slice Thickness: Nominal slice thickness,in mm. |
layer thickness | DS |
0018 | 0088 | Spacing Between Slices | The distance between layers, in mm | DS |
0020 | 1041 | Slice Location: Relative position of exposure expressed in mm. |
Actual relative position in mm | DS |
0018 | 0023 | MR Acquisition | CS | |
0018 | 0015 | Body Part Examined | body parts | CS |
(4) Image Tag
Group | Element |
|
|
Data Type (VR) |
---|---|---|---|---|
0008 | 0008 | Image Type: Image identification characteristics. |
CS | |
0008 | 0018 | SOP Instance UID | SOP instance UID | |
0008 | 0023 | Content Date: The date the image pixel data creation started. |
Image shooting date | AND |
0008 | 0033 | Content Time | 影像拍摄时间 | TM |
0020 | 0013 | Image/Instance Number: A number that identifies this image. |
图像码: 识别图像的号码 |
IS |
0028 | 0002 | Samples Per Pixel: Number of samples (planes) in this image. |
图像采样率 | US |
0028 | 0004 | Photometric Interpretation: Specifies the intended interpretation of the pixel data. |
光度计解释: 对于CT图像,用两个枚举值MONOCHROME1,MONOCHROME2 用来判断图像是否是彩色的; MONOCHROME 1/2是灰度图,RGB则是真彩色图 |
CS |
0028 | 0010 | Rows : Number of rows in the image. | 图像的总行数,行分辨率 | US |
0028 | 0011 | Columns : Number of columns in the image. | 图像的总列数,列分辨率 | US |
0028 | 0030 | Pixel Spacing: Physical distance in the patient between the center of each pixel. |
像素间距: 像素中心之间的物理间距 |
DS |
0028 | 0100 | Bits Allocated: Number of bits allocated for each pixel sample.Each sample shall have the same number of bits allocated. |
分配的位数: 存储每一个像素值时分配的位数,每一个样本该值相同 |
US |
0028 | 0101 | Bits Stored: Number of bits stored for each pixel sample.Each sample shall have the same number of bits stored. |
存储的位数:有12到16列举值 存储每一个像素用的位数,每一个样本该值相同 |
US |
0028 | 0102 | High Bit: Most significant bit for pixel sample data. Each sample shall have the same high bit. |
高位 | US |
0028 | 0103 | Pixel Representation: Data representation of the pixel samples. Each sample shall have the same pixel representation. Enum:0000H=unsigned integer,0001H=2’ s complement. |
像素数据的表现类型: 一个枚举值,分别为十六进制数0000和0001. 0000H = 无符号整型, 0001H = 2的补码 |
US |
0028 | 1050 | Window Center | 窗位 | DS |
0028 | 1051 | Window Width | 窗宽 | DS |
0028 | 1052 | Rescale Intercept: The value b in relationship between stored values(SV) and the output units. Output units = m*SV + b. Required if Modality LUT Sequence(0028, 0030) is not present. |
截距: 如果表明不同模态的LUT颜色对应表不存在时,则使用方程: Units = m*SV + b,计算真实的像素值到呈现像素值,其中截距为表达式中的b |
DS |
0028 | 1053 | Rescale Slope: m in the equation specified by Rescale Intercept(0028, 1052). Required if Rescale Intercept is present. |
斜率: 该值为表达式中的m |
DS |
0028 | 1054 | Rescale Type: Specifies the output units of Rescale Slope (0028,1053) and Rescale Intercept (0028,1052). Enum: US=Unspecified Requried if Photometric Interpretation is MONOCHROME2, and Bits Stored is greater than 1. This specifies an identity Modality LUT transformation. |
输出值的单位: 该值是一个枚举值 |
LO |
2.VR数据类型
VR是DICOM标准中用来描述数据类型的,总共有27个值。
27种数据类型
|
|
允许字符 | 数据长度 |
---|---|---|---|
CS - Code String 代码字符串 |
开头结尾可以有没有意义的空格的字符串,比如 “CD123_4" | 大写字母,0-9,空格以及下划线字符 | 最多16个字符 |
SH - Short String 短字符串 |
短字符串,比如:电话号码, ID 等 | 最多16个字符 | |
LO - Long String 长字符串 |
一个字符串,可能在开头、结尾填有空格。比如 “Introduction to DICOM” | 最多64个字符 | |
ST - Short Text 短文本 |
可能包含一个或多个段落的字符串 | 最多1024个字符 | |
LT - Long Text 长文本 |
可能包含一个或多个段落的字符串,与 LO 相同,但可以更长 | 最多10240个字符 | |
UT - Unlimited Text 无限制文本 |
包含一个或多个段落的字符串,与 LT 类似 | 最多(232 -2)个字符 | |
AE - Application Entity 应用实体 |
标识一个设备的名称的字符串,开头和结尾可以有无意义的字符。比如 “MyPCO 1” | 最多16个字符 | |
PN - Person Name 病人姓名 |
有插入符号 (^) 作为姓名分隔符的病人姓名。比如 “SMITH^JOHN” “Morrison Jones Susan^^^Ph.D,Chief Executive Officer” |
最多64个字符 | |
UI - Unique Identifier(UID) 唯一标识符 |
一个用作唯一标识各类项目的包含UID的字符串。比如 “1.2.840.10008.1.1” | 0-9和半角句号 (.) | 最多64个字符 |
DA - Date 日期 |
格式为 YYYYMMDD 的字符串;YYYY 代表年;MM 代表月;DD 代表日。比如 “20050822” 表示 2005 年 8 月 22 日 | 0-9 | 8个字符 |
TM - Time 时间 |
格式为 HHMMSS.FRAC 的字符串。 HH 表示小时(范围"00"-“23”); MM 表示分钟 (范围"00"-“59”); 而 FRAC 包含秒的小数部分,即百万分之一秒, 比如 “183200.00” 表示下午 6:32 |
0-9和半角句号 (.) | 最多16个字符 |
DT - Date Time 日期时间 |
格式为 YYYYMMDDHHMMSS.FFFFFF,串联的日期时间字符串。 字符串的各部分从左至右是:年-YYYY;月-MM;日-DD;小时-HH;分钟-MM;秒-SS;秒的小数-FFFFFF。 比如 “20050812183000.00” 表示 2005 年 8 月 12 日 下午 18 点 30 分 00 秒 |
0-9,加号,减号和半角句号 | 最多26个字符 |
AS - Age String 年龄字符串 |
符合以下格式的字符串:nnnD,nnnW,nnnM,nnnY;其中 nnn 对于 D 来说表示天数,对于 W 来说表示周数,对于 M 来说表示月数,对于 Y 来说表示岁数。比如 “018M” 表示他的年龄是 18 个月 | 0-9,D,W,M,Y | 4个字符 |
IS - Integer String 整型字符串 |
表示一个整型数字的字符串,比如 “-1234567” | 0-9,加号 (+),减号 (-) | 最多12个字符 |
DS - Decimal String 小数字符串 |
表示定点小数和浮点小数,比如 “12345.67”, “-5.0e3” | 0-9, 加号 (+), 减号 (-), 最多16个字符 E, e 和半角句号(.) | 最多16个字符 |
SS - Signed Short 有符号短型 |
符号型二进制整数,长度 16 bits | 2个字符 | |
US - Unsigned Short 无符号短型 |
无符号二进制整数,长度 16 bits | 2个字符 | |
SL - Signed Long 有符号长型 |
有符号二进制整数 | 4个字符 | |
UL - Unsigned Long 无符号长型 |
无符号二进制长整数,长度 32 bits | 4个字符 | |
AT - Attribute Tag 属性标签 |
16 bits 无符号整数的有序对,数据元素的标签 | 4个字符 | |
FL - Floating Single 单精度浮点型 |
单精度二进制浮点数 | 4个字符 | |
FD - Floating Point Double 双精度二进制浮点型 |
双精度二进制浮点数 | 8个字符 | |
OB - Other Byte String 其它字节字符串 |
字节的字符串("其它"表示没有在VR中定义的内容) | ||
OW - Other Word String 其它单词字符串 |
16 bits(2字节) 单词字符串 | ||
OF - Other Float String other floating point string |
32 bits (4 bytes) floating point word string | ||
SQ - Sequence Items entry sequence |
sequence of entries | ||
UN - Unknown unknown |
A string of bytes where the encoding of the contents is unknown |
reference: