A method for parsing curve data graphs in Python

In our data analysis work, we often encounter situations where there is no direct data. For curve graphs, we need to analyze the data in the curve graph.

For example, in the figure below, according to the document, we know that the abscissa value range is (0,175) and the ordinate value range is (0,156). How to convert the curve into operable data? The specific steps are as follows:
Insert image description here

  1. Use the drawing tool to cut out the figure and cut out the figure to be processed according to the coordinate boundary of the curve, such as the "draw" function on the Windows system;
  2. Use the API provided by OpenCV, see " Opencv-python Icon and Watermark Removal Scheme Practice " for details, and adopt the scheme of directly extracting curves;
  3. Use the image grayscale processing and binarization API provided by OpenCV to obtain the target image;
  4. Analyze the binarized grayscale image to obtain the data.

1. Use drawing tools to cut out the picture

Insert image description here
We save the cut out picture as: heart1.JPG.

2. OpenCV solution for directly extracting curves

import numpy as np
import matplotlib.pyplot as plt
import cv2
img=cv2.imread('img\heart1.JPG')

h,w,l=img.shape

for j in range(h):
    for k in range(w):
        if img[j][k][0] <135 or img[j][k][0] >175 or img[j][k][1] <185 or img[j][k][1] >220 or img[j][k][2] <45 or img[j][k][2] >129:
            img[j][k][0] = 255
            img[j][k][1] = 255 
            img[j][k][2] = 255
            
plt.imshow(img,cmap=plt.cm.gray)

Among them, to extract the color range value, please refer to the document " Opencv-python Icon and Watermark Removal Scheme Practice ".

3. Use OpenCV image grayscale processing and binarization API

image_gray = cv2.cvtColor(img, cv2.COLOR_BGRA2GRAY)  # 转换成灰度图
plt.imshow(image_gray,cmap=plt.cm.gray)
# 二值化
thresh, new_img = cv2.threshold(image_gray, 200, 255, cv2.THRESH_BINARY)

cv2.imshow('NEW_IMG', new_img)
cv2.waitKey()
new_img = new_img.astype(np.int16)

Note: The image data format is uint8, with a maximum value of 255. It needs to be converted to int16 to store the actual coordinate values.

Image binarization API provided by OpenCV, threshold() method parameters:

  • Picture matrix
  • threshold
  • Maximum value in the picture
  • Binarization method

Binarization method:

THRESH_BINARY Above the threshold is changed to 255, below the threshold is changed to 0
THRESH_BINARY_INV Above the threshold is changed to 0, below the threshold is changed to 255
THRESH_TRUNC Truncate, change the value above the threshold to the threshold, and the maximum value becomes invalid.
THRESH_TOZERO If it is above the threshold, it will not change; if it is below the threshold, it will be changed to 0.
THRESH_TOZERO_INV It should be 0 if it is above the threshold, and it will not change if it is below the threshold.

Insert image description here

4. Analyze the binarized grayscale image to obtain data

The principle of parsing data:

First, clarify the coordinate system, image data (two-dimensional), starting from (0, 0) corresponding to the actual graphic (0, h), where h is the maximum height, which can be understood as the reverse order of the longitudinal coordinates.

Secondly, analyze the graph data. For the black curve in the graph, the value is 0, which is directly marked as the coordinate of the vertical axis (note the reverse order), and the others are set to 0, thus obtaining a two-dimensional matrix to be processed.

Next, the coordinate mean is defined as the target data.

r,c = new_img.shape
for i in range(r):
    for j in range(c):
        if new_img[i][j]==255:
            new_img[i][j]=0
        else:
            new_img[i][j]=280 - i + 1

dat = new_img.mean(axis=0, keepdims=False,where=new_img>0)

#滤除手工抠图边界无数据情况
mask = np.isnan(dat)
dat = np.delete(dat, np.where(mask))
dat = dat*156/np.max(dat)
# 解析出数据,未进行度量单位转换
new_dat = dat.astype(np.int16)

# 绘图回放数据
plt.rcParams['font.sans-serif'] = ['SimHei']     # 设置正常显示中文
plt.rcParams['axes.unicode_minus']=False # 解决不显示负号 
plt.figure(figsize=(12,6))
plt.xlim(0,570)
x_lable = ['0','30','60','90','120','150','180']
pos_list = [0, 94, 188, 281, 375, 469,563]

plt.xticks(pos_list, x_lable)
#ax = plt.axes()
#ax.xaxis.set_major_locator(ticker.FixedLocator((name_list)))
#ax.xaxis.set_major_formatter(ticker.FixedFormatter((x_lable)))
plt.ylim(0,180)
plt.plot(new_dat)
plt.ylabel("心率")

plt.show()

Plot playback data, the results are as follows:
Insert image description here

5. Summary

It is still troublesome to parse curve data graphs. The first is image processing technology. Our common curve graphs are generally relatively clean and easier to process. During the parsing process, coordinate conversion is required, which is the measurement unit of the data.

Welcome to exchange more methods.

reference:

Xiao Yongwei. Opencv-python icon removal and watermark solution practice . CSDN Blog. 2023.09

Guess you like

Origin blog.csdn.net/xiaoyw/article/details/132918834