Data Standardization for Real World Research (RWS)

In the previous article, we introduced the method of processing unstructured text in real-world research. Although unstructured data can be converted into structured data through this method, it is still far from the data that can be directly counted. There is a certain distance. Data normalization is very important for real world research. In RWS, we often need to process data from multiple sources, and these data are usually unstructured or semi-structured. This means that the data may exist in several different formats and normalization methods.

If we do not normalize these data, it may lead to inaccurate and incomplete data. Therefore, data standardization is very important for RWS to establish high-quality, reliable and scalable data pipeline and analysis, and it is one of the necessary steps for RWS research.

Today we will introduce how to use Zhiwu AI to standardize data.

This is an excerpt from the medical records of Lilac Garden,

The patient, male, 50 years old, was admitted to the hospital mainly because of "disorder of consciousness". The patient received arsenic trioxide (10 mg/d) treatment for acute promyelocytic leukemia 26 days ago, and now the patient has disturbance of consciousness. Physical examination revealed a P of 92 beats/min and a BP of 102/63 mmHg. The admission electrocardiogram is shown in Figure 1. Electrolytes: blood potassium was 3.9 mmol/L, blood magnesium was 1.10 mmol/L, and blood calcium was 2.5 mmol/L.

We follow the idea of ​​​​divide and conquer, from coarse to fine, and deal with it layer by layer.

The first step is to extract the inspection information

Obviously, blood potassium, blood magnesium, and blood calcium are not labeled names. Different hospitals and channels will have different names, and the next step is to standardize these names.

The second step is to standardize the extracted information

Standardization, so the test item representing "potassium" will be uniformly called "potassium (K)".

Through the above examples, I believe everyone understands the content and methods of the work we have to do in this step. It seems that a simple function will actually directly affect the quality of the data and whether the subsequent statistical work can be carried out smoothly.

Disclaimer: The AI ​​in the picture in this article comes from " Knowledge AI Q&A " , an all-round system of " smart question and answer " , " knowledge acquisition " and " content generation " . If you are interested, you can pay attention to the public account " Yunzhi Borui " or " Yunzhi AI Assistant " .

Guess you like

Origin blog.csdn.net/cloudwizdom/article/details/130359530