A fully automatic paper ECG digitization algorithm based on deep learning

2b2cd24fe66aac027c2a8bcec398c088.jpeg


 Produced by CDA Data Analyst  

Application of Deep Learning in Medical Detectors

584ad47989d3767e254b5559d835b1ea.png

There is growing interest in applying deep learning methods to electrocardiograms (ECGs), and recent studies have shown that neural networks (NNs) can predict future heart failure or atrial fibrillation from ECGs alone. However, the training of neural networks requires a large number of ECGs, and many ECGs are only paper-based, which is not suitable for neural network training. In this context, a fully automated online ECG digitization tool was developed to convert scanned paper ECGs into digital signals.

The algorithm uses automatic horizontal and vertical anchor point detection, automatically segments the ECG image into individual images of 12 leads, and then applies a dynamic morphology algorithm to extract the signal of interest. We then validated the algorithm's performance on 515 digital ECGs, 45 of which were printed, scanned and re-digitized. After excluding ECGs with lead signal overlap, the automated digitization tool achieved 99.0% correlation between digitized signals and ground-truth ECGs (n = 515 standard 3 × 4 ECGs). Without exclusion, the average correlation of leads across all 3 × 4 ECGs was 90% to 97%. After excluding ECGs with lead signal overlap, the correlation was 97% for the 12×1 and 3×1 ECG formats. Without exclusion, the average correlation of some leads in the 12×1 ECG was 60-70%, while the average correlation reached 80-90% in the 3-by-1 ECG.

Printing, scanning and re-digitizing ECGs, our tool achieved 96% correlation with the original signal. We developed and validated a fully automated, user-friendly online ECG digitization tool. Unlike other available tools, this does not require any manual segmentation of the ECG signal. Our tool can facilitate rapid and automated digitization of large paper-based ECG repositories, enabling their use in deep learning projects.

Method overview

8c1966c9fd8605c05f3e27679c417e86.png

Figure 1: ECG digitization algorithm

Figure 1 outlines our automatic ECG digitization algorithm: paper ECG images are first preprocessed to remove any edited regions and gridlines, and then converted to binary images, enabling subsequent detection of the ECG baseline. After the ECG baseline is detected, use the vertical anchors to detect the upper and lower boundaries of each ECG lead signal. This step also allows the algorithm to determine the layout (ie, number of rows) of the ECG leads on the printed ECG. Next, using lead name detection, the horizontal anchor points of each lead, the left and right boundaries of the ECG signal to be digitized, representing their start and end, respectively, are used to crop and extract the signal for each lead of the 12-lead ECG couplet. Finally, the signal in each lead is digitized individually.

data source

Our online ECG digitization tool was developed using 12-lead ECGs recorded in patients at Imperial College London NHS Trust. These ECGs were initially printed on paper and provided to the research team as anonymous scans in Portable Document Format (PDF), which were subsequently reformatted as 250 dpi Portable Network Graphics (PNG) files. These ECGs are usually in a traditional 3 x 4 lead configuration with lead II rhythm strips. The database contains only paper ECGs, no digital ECG ground truth data.

For validation, we used anonymized 12-lead ECGs from Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA, USA as PNG files in 3 × 4, 12 × 1 and 3 × 1 lead configurations to validate our digitized tool. The second database contains ECG images and digital ECG ground truth data. All ECGs used in the development and testing of our digitized tools were calibrated to 1 mV = 10 mm and recorded at a paper speed of 25 mm/s.

Both Imperial College and BIDMC provided ethics review for the project. All methods were performed in accordance with relevant guidelines and regulations. Ethical approval for data collection used in this study was granted by the Research Ethics Committee of the Directorate for Health Research, London (Hampstead) (protocol number 20HH5967, REC reference 20/HRA/2467, sponsor Imperial College London). All subjects and/or their legal guardians obtained informed consent. This study complies with the Declaration of Helsinki.

Method steps

preprocessing

In the database used for development, all ECGs contained a header consisting of black pixels of redacted patient information, which could adversely affect the digitization of the ECG trace. Therefore, the edited areas of each ECG were automatically removed before the digitization process was implemented. The edited region is black, resulting in an average pixel intensity of zero for each row of the edited region, while the average pixel intensity becomes a positive scalar value in the region of interest to be digitized. This enables edited regions to be reliably identified and removed prior to digitization of the ECG signal.

a3a1bb4e9b7a3fc74534fa75c5f88818.png

The figure above demonstrates an overview of the automatic ECG digitization algorithm: Step I: Preprocessing the 12-lead ECG image to remove the edited portion of the ECG and ECG grid. Then determine the ECG baseline with the help of the vertical anchor points to obtain the ECG configuration. Step 2: After determining the horizontal and vertical anchor points and lead configuration, crop the 12-lead signal. Step 3: Extract ECG signals from single-lead ECG images. Step 4: Design the user interface using the dashboard tool.

ECGs are usually printed on paper that includes gridlines that are removed prior to the digitization process. Given that the ECG contains red pixels, set the red channel of the image to 1 and convert the image to grayscale. A threshold of 0.94 was used to distinguish between the pixels making up the ECG signal and the gridlines. Pixels > 0.94 were discarded, those with ≤ 0.94 were considered as ECG signal or indication of lead name. In this way, the ECG and lead name information in the binary image is extracted, and the background and grid lines are removed. The processed binary image is shown in Figure 2 A, B.

ECG baseline detection and ECG configuration determination

After preprocessing, the first step in the automated digitization process requires the algorithm to detect the signal baseline and determine the line number of the ECG signal to determine the ECG configuration. We consider the ECG baseline as the horizontal line with the highest ECG signal strength on the horizontal axis.

The Hough transform is a coordinate transformation that converts an image from a Cartesian coordinate axis to a polar coordinate axis, and has been used for computer vision feature extraction of digital images. Here, we apply the Hough transform to identify the ECG baseline. To perform the Hough transform and limit the number of plausible solutions, two constraints were implemented to avoid inaccurate baseline identification. First, given that the ECG baseline is expected to be close to horizontal, only the line between −2.5 and +2.5 is considered around the x-axis. Second, since the baseline is expected to extend nearly the entire image, any lines smaller than 80% of the printed ECG width are discarded. Where there is space between ECG lead waveforms, merge lines if the inter-lead space is not greater than 15% of the total image width. This ensures that ECG signals from adjacent leads remain independent and are not combined during digitization. This approach also helps determine the baseline amount on the printed ECG and, in combination with the underlying vertical anchor detection, provides information on the lead configuration.

0c9b9db8d5b27306c41910daf1bd39ec.png

Figure 2: Individual ECG signal images cropped for each lead: (A) raw 12-lead ECG scan with patient identifiable information; (B) baseline detection used to determine vertical distance between leads; (C) Lead name detection is used to determine the horizontal distance between leads; (D) Cropped to obtain the ECG signal for each lead. The width of the crop is the distance from the end of the leader name to the start of the adjacent leader name, and the height of the crop is 1.4 times the vertical distance of the detected baseline in the middle.

Automatic anchor point detection

Vertical anchor detection

Just as baseline detection is used to determine vertical anchor points to identify the ECG signal in space, vertical anchor points are used to determine the upper and lower boundaries of the signal in each ECG lead to identify the signal to be digitized. The vertical cropping length is shown in Fig. 2B. The upper and lower boundaries were defined as 0.7 times the distance between two adjacent ECG signals (horizontal planes) above and below the ECG baseline, respectively.

Horizontal anchor detection

Horizontal anchor points are used to define the left and right boundaries of the ECG signals to be digitized, indicating their start and end, respectively. The lead names in the horizontal plane and the start points of the subsequent ECG signals constitute the start and end points of the ECG signals to be digitized. The maximum horizontal distance containing ECG signals from other leads in the same ECG is used to define the right-hand border of the rightmost lead of the image, which has no right-hand border.

Our text recognition model failed to detect these names when they were very close to the ECG baseline. In these cases, the ECG baseline was removed to allow digitizing tools to recognize the lead names. Furthermore, morphological dilation and erosion were applied to the images to enhance the distinguishability of the guide names from surrounding signals. Thus, it makes it easier for text recognition models to identify these cases. Dilation is an iterative region-growing algorithm that thickens lines, and erosion is an iterative region-reduction algorithm that makes lines thinner, thereby making any object of interest more recognizable to automated processes. Filter all objects of interest in the image in this way to exclude those with aspect ratio >5 and those with width or height <5 or >500 pixels.

Thereafter, the trained text character recognition deep learning model 32 is used to specifically detect potential customer names among other filtering objects. The input to the model consists of a 12-lead ECG binary image and 12 ground-truth lead name text strings

('I', 'II', 'III', 'avr', 'avl', 'avf', 'v1', 'v2', 'v3', 'v4', 'v5', 'v6')。

The output constitutes any text detected by the model, its corresponding bounding box, and confidence score. The threshold for the confidence score is set to detect lead names such that identifying one of the text strings causes the confidence score to exceed the threshold. In this way, the thread name object, the position, height and width information of the thread name object are identified for their implementation as horizontal anchors. The process of obtaining horizontal distances from lead name detection is shown in Figure 2C. In the case of unsuccessful detection of some lead names, the horizontal anchor is based on the distance between other lead names that were successfully identified in the same ECG to be sure. After successful identification of horizontal and vertical anchor points, the ECG segments of each lead were cropped, as shown in Figure 2D.

Single lead ECG extraction

Extracting the ECG signal from the cropped image requires removal of "salt and pepper" noise, including sparse white and black pixels, as well as any partial ECG signal from other leads. The latter is especially true for large-scale ECG traces that would encroach on cropped images of adjacent leads, as shown in Figure 3 . To this end, we first use image dilation to connect any discontinuities in the ECG signal of interest, which also prevents any spurious connections to noise or adjacent signals. Thereafter, we consider the largest detectable object in the image as the ECG signal of interest and all other objects as artifacts. This process is illustrated in Figure 3. This shows that this method preserves the signal of interest and removes other objects contained in the cropped image.

The next step involves converting the extracted binary ECG image into a one-dimensional digital ECG signal. The ECG signal in the binary image consists of a set of pixels with x (time) and y (voltage) coordinates, calibrated at 25 mm/s and 10 mm/mv. For any given point in time (x-axis), several pixels can make up the corresponding magnitude. Given that a digital ECG signal can only have one y-coordinate for each x-coordinate, we reconstruct the digital ECG signal using the median magnitude pixel (y-axis) in the binary image. This generates a digital ECG signal with x and y coordinates in pixels. To attribute time and voltage values ​​to the digital ECG signals, we used the rhythm (or longest signal) strips in each ECG to determine the time and voltage resolution. Given that a standard 12-lead ECG has a duration of 10 seconds, the temporal resolution is calculated as 10 seconds divided by the number of pixels on the x-axis. A standard value for voltage-time resolution is 0.1 mV/40 ms = 0.0025 mV/ms, which allows the voltage resolution of a signal to be determined by multiplying the time resolution and the voltage-time resolution (0.0025 mV/ms). Thus, the time of the digital ECG signal is calculated as the number of pixels on the x-axis times the time resolution, and the magnitude is the number of pixels on the y-axis times the voltage resolution.

78a1ed0ec1bc41998cd222952db18b37.png

Figure 3: Cleaning process of cropped ECG images. After region-of-interest clipping, the dilation process connects possible breakpoints horizontally to obtain the full ECG signal. Thereafter, the labeling process identifies the largest objects as signals of interest. Finally, artifacts in the cropped image are removed to preserve the signal of interest.

Dashboard online tool development

We developed an online tool in Python dash plotly. The steps below provide end users with step-by-step instructions for using the online tool. First, users need to scan and upload an ECG image. Users are reminded to fully redact and anonymize all confidential or patient identifiable data. Images are read by the Python method "cv2.imread", which can support any image format supported by "cv2.imread". After the image is uploaded, it is displayed with a fixed height of 600 pixels (px). Next, a drop-down bar provides the option to visualize each digitized ECG signal, with the option to change the resolution by zooming in or minimizing the image. The digitized ECG can be downloaded into a 13-column spreadsheet, the first column providing time axis data and the remaining 12 columns being ECG signal voltage data.

Statistical Analysis

We validated our tool using Pearson correlation and root mean square error (RMSE) to determine the correlation between the real ECG signal and the digitized ECG signal generated by our digitizing tool. Validation was performed on an independent database obtained from BIDMC. Pearson correlation coefficient and root mean square error (RMSE) implemented using Python ("scipy.stat.pearsonr" for Pearson correlation coefficient, "sklearn.metrics.mean_squared_error" for RMSE).

result

The first step in the digitization process requires the algorithm to detect the lead configuration of the printed ECG using horizontal and vertical anchor points so that each lead can be cropped in turn. Other digitizing tools28 have developed a similar interface using line detection algorithms for horizontal and vertical anchor point detection that works with ECGs printed in a 6 × 2 configuration. Although our tool employs a similar approach to vertical anchor detection, we also apply a deep learning-based text recognition model to detect the protagonist's name for horizontal anchor detection. This has the advantage of allowing the software to extract data from any ECG configuration. While it is possible to identify horizontal anchor points by dividing the ECG image in half, this approach may not be accurate in ECG configurations where the leads are not equidistant and are only available for the 6 × 2 ECG configuration. Other digitizing tools also require manual marking of anchor points and are limited in their application by ECG configuration. They are also user dependent, requiring manual selection of each lead prior to the digitization process. In contrast, our digitization tool can be used with different configurations of ECGs and does not require manual input prior to the digitization process. We envision that this will facilitate its application in both clinical and non-clinical settings, enabling larger volumes of printed ECGs to be digitized in less time.

Following lead detection and individual lead clipping, our digitization tool provides an efficient method for ECG signal extraction. Similar to other digitizing interfaces, we apply connectivity algorithms to label and delete small objects. However, other existing digitization methods cannot remove all non-ECG artifacts or parts of the ECG signal from other leads, which requires other processes, such as an iterative process of selecting pixels from left to right of the image. While this method can extract an ECG, it can be a complex and time-consuming process. In contrast, we leverage a dynamic morphology approach to connect any discontinuities in the ECG signal before identifying the largest labeled objects as ECG signals of interest. This effectively removes noise without further computational processing.

Traditionally, many existing ECG digitization tools required manual segmentation, gridline removal, and processing to extract the digital signal. Ravi Chandran et al. and Lobodzinski et al. Optical character recognition has been applied to scan and reference printed text with databases of predefined character templates, or to store demographic data. In addition to traditional methods, others have used end-to-end deep learning techniques for ECG digitization. However, their technique is limited in generalizability to different ECG image databases, especially in different configurations.

The motivation for developing our tool was to enable users to quickly and easily generate large numbers of digital ECGs from their paper, images or scanned copies. We anticipate this will be particularly useful for individuals wishing to use ECGs in machine learning applications. While this can be achieved without digitizing the ECG, for example using a paper ECG or its images30, any output from these procedures is inherently dependent on the quality of the input. In contrast, our tool digitizes paper ECGs with different configurations to generate standardized inputs for machine learning algorithms.

Overall, our digital tools have the following advantages:

1. Fully automatic, no need for users to manually input single-lead signal segmentation.

2. Text recognition based lead name detection allows our digital tools to be universal across ECG images of different configurations or paper ECG scans.

3. Efficient ECG extraction algorithm for fast digitization when needed.

4. The Pearson correlation and root mean square error of the true digital ECG and digitized ECG waveforms are powerful methods for validating ECG digitization tools.

reference

Tuncer, T., Dogan, S., Plawiak, P. & Subasi, A. A novel discrete wavelet-concatenated mesh tree and ternary chess pattern based ECG signal recognition method. Biomed. Signal Process. Control 72, 103331 (2022).

Tuncer, T., Dogan, S., Pławiak, P. & Acharya, U. R. Automated arrhythmia detection using novel hexadecimal local pattern and multilevel wavelet transform with ECG signals. Knowl. Based Syst. 186, 104923 (2019).

Baygin, M., Tuncer, T., Dogan, S., Tan, R.-S. & Acharya, U. R. Automated arrhythmia detection with homeomorphically irreducible tree technique using more than 10,000 individual subject ECG records. Inf. Sci. 575, 323–337 (2021).

Kobat, M. A., Karaca, O., Barua, P. D. & Dogan, S. Prismatoidpatnet54: an accurate ECG signal classification model using prismatoid pattern-based learning architecture. Symmetry 13, 1914 (2021).

Attia, Z. I. et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 394, 861–867 (2019).

Raghunath, S. et al. Deep neural networks can predict new-onset atrial fibrillation from the 12-lead ECG and help identify those at risk of atrial fibrillation-related stroke. Circulation 143, 1287–1298 (2021).

e5d835896a7238d394920bc76737f016.gif

88d9751da1b4d30fa99c2f8063816c59.gif

b86fd51b03f7182c2ee676526d1a9cca.jpeg

Guess you like

Origin blog.csdn.net/weixin_38754337/article/details/130397439