2020 The 17th China Postgraduate Mathematical Modeling Competition C problem for brain signal analysis and discriminant modeling for rehabilitation engineering

EEG signal analysis and discrimination model for rehabilitation engineering
Background and meaning The
brain is the center of high-level neural activity in the human body. It has hundreds of millions of neurons and communicates and processes human body information through interconnection. EEG signals can be divided into evoked EEG signals and spontaneous EEG signals according to the way they are generated. Evoked brain electrical signals are brain electrical activities formed by a certain external stimulus to cause potential changes in the brain; spontaneous brain electrical signals refer to brain electrical activities spontaneously produced by the brain without special external stimuli.
(1) Elicited EEG signals (P300 brain-computer interface)
In daily life, the human brain controls perception, thinking, movement, language and other functions, and uses peripheral nerves as a medium to issue instructions to various parts of the body. Therefore, when peripheral nerves or muscles are damaged, the transmission path of the brain's instructions will be blocked, and the human body will not be able to complete the output of the brain's instructions normally, and it will lose the ability to communicate and control with the outside world. Studies have found that when the peripheral nerves are out of function, the human brain can still function normally, and part of the information that it issues instructions can be represented by some paths. The brain-computer interface technology aims to achieve communication between the brain and external auxiliary equipment without relying on the normal communication system of the output path composed of peripheral nerve or muscle tissue.
The P300 event-related potential is a kind of evoked EEG signal, a positive wave peak (a wave showing an upward trend relative to the baseline) that appears about 300 milliseconds after the occurrence of a small probability stimulus. Due to the differences between individuals, the occurrence time of P300 is also different. Figure 1 shows the P300 waveform about 450 milliseconds after the stimulation. As an endogenous component, P300 potential is not affected by the physical characteristics of the stimulus, it is related to perception or cognitive mental activities, and is closely related to processing processes such as attention, memory, and intelligence. The advantage of the P300-based brain-computer interface is that users can obtain higher recognition accuracy without complicated training, and have stable time-locking and high time accuracy characteristics.
Insert picture description here

Figure 1 P300 waveform diagram
(2) Spontaneous EEG signals (Sleep EEG)
Sleep is an important part of the body's rest and energy accumulation, and the quality of sleep also has a significant impact on people's physical and mental state. How to improve the quality of sleep and reduce the impact of sleep-related diseases on health has received increasing attention. The EEG signals collected during sleep are spontaneous EEG signals. Spontaneous sleep EEG signals can reflect changes in the body's own state, and are also an important basis for the diagnosis and treatment of related diseases.
The sleep process is a complex process with dynamic changes. In the interpretation standard R&K of international sleep staging, the different states in the sleep process are divided: except for the awake period, the sleep cycle is alternately cycled by two sleep states, namely the non-rapid eye movement period and the rapid eye movement period; In the non-rapid eye movement phase, according to the gradual change of sleep state from shallow to deep, it is further divided into sleep phase I, sleep phase II, sleep phase III and sleep phase IV; sleep phase III and sleep phase IV can be combined into deep sleep period. Figure 2 shows the time sequence of EEG signals corresponding to different sleep stages, from top to bottom, they are awake, sleep I, sleep II, deep sleep, and rapid eye movement. It can be observed from Figure 2 that the characteristics of EEG signals are different in different sleep stages. Automatic staging based on EEG signals can reduce the manual burden of experts and physicians. It is also an important auxiliary tool for evaluating sleep quality, diagnosing and treating sleep-related diseases.

(a) Sober period

(b) Sleep stage I

© Sleep Phase II

(d) Deep sleep period

(e) Rapid eye movement phase
Figure 2 Sleep EEG signal time sequence of each sleep stage
Subject tasks
This contest contains 2 attachments (data files) and four task tasks. The specific instructions are as follows.
Attachment 1: P300 brain-computer interface experimental data
provides the P300 brain-computer interface experimental data of 5 healthy adult subjects (S1-S5), the average age is 20 years old. In the course of the experiment, each participant (subject) is required to concentrate. The design of the P300 brain-computer interface experiment is as follows: each participant can observe a character matrix composed of 36 characters, as shown in Figure 3, the character matrix is ​​in rows or columns (a total of 6 rows and 6 columns). The design process of each round of experiment: First, prompt the subjects to look at the "target character", such as the gray character "A" that appears above the character matrix in Figure 3; secondly, enter the flashing mode of the character matrix, each time in a random order Flash one row or one column of the character matrix, the flashing duration is 80 milliseconds, and the interval is 80 milliseconds; finally, when all rows and columns flash once, the experiment ends. In the process of the subjects watching the "target character", when the row or column of the target character flashes, the P300 potential will appear in the EEG signal; and when the other rows and columns flash, the P300 potential will not appear. The above-mentioned experimental procedure is 1 round, repeated 5 rounds in total.

Figure 3 Character matrix interface

The P300 EEG data of each subject contains 4 files, which are described as follows
: train_data: training data;
train_event: event label of training data;
test_data: test data;
test_event: event label of test data.
The training data includes data of 12 known target characters (char01 char12), and the test data includes data of 10 target characters to be recognized (char13 char22). In each character matrix scintillation experiment, the EEG data table contains 20 columns (each column represents 1 recording channel, and the recording channels are numbered in turn. Table 1 is the identifier of the recording channel, and Figure 5 corresponds to the location of the recording channel), The rows of the EEG data table represent sample point data, and the sampling frequency is 250 Hz. The signal acquisition device is equipped with a reference electrode and a ground electrode, that is, the signal of the recording channel is the difference between the active electrode and the reference electrode.

Table 1 Identifier of acquisition channel
Identifier channel
Name Identifier channel
Name
1 Fz 11 CP5
2 F3 12 CP6
3 F4 13 Pz
4 Cz 14 P3
5 C3 15 P4
6 C4 16 P7
7 T7 17 P8
8 T8 18 Oz
9 CP3 19 O1
10 CP4 20 O2

FIG 5 FIG EEG acquisition channel
tag in the training data file is the same sub-table form corresponding to the experimental data, the name of the sub-table is "charXX (Y)", XX corresponding to the sequence number of the corresponding character, Y represents an actual Target character. The content of the sub-table contains two columns, the first column is the label, and the second column is the sampling point number. The starting label of each round of the experiment is the identifier corresponding to the target character (the 36-character identifier in the character matrix is ​​shown in Table 2 for details, such as "101" for "A"), followed by the flashing row or column identifier (See Figure 6 for details. For example, "2" represents the second row and "9" represents the third column), the end label of one round of experiment is "100". In the event label file of the training data, the first line gives the identifier of the target character and the corresponding sampling point serial number, followed by the randomly flashing row and column identifiers and the corresponding sampling point serial number. The "100" identifier ends, repeated 5 times in total;
the label file in the test data also corresponds to the experimental data in the form of a sub-table, the name of the sub-table is "charXX", and XX corresponds to the serial number of the corresponding character. In the event tag file of the test data, the first line gives the identifier of the target character to be recognized, which is uniformly expressed as "666". After the EEG signal is analyzed, the row and column where the P300 potential appears is obtained, and Determine the recognition result of the target character.
Table 2 Identifier of character matrix
A 101 B 102 C 103 D 104 E 105 F 106
G 107 H 108 I 109 J 110 K 111 L 112
M 113 N 114 O 115 P 116 Q 117 R 118
S 119 T 120 U 121 V 122 W 123 X 124
Y 125 Z 126 1 127 2 128 3 129 4 130
5 131 6 132 7 133 8 134 9 135 0 136

Figure 6 Identifiers in rows/columns
Annex 2: Sleep EEG data
Provide 3000 sleep EEG feature samples and their labels, taken from different healthy adults during the whole night sleep. The first column is the "known label", which uses numbers to indicate different sleep stages: awake (6), rapid eye movement (5), sleep I (4), sleep II (3), deep sleep Period (2); The second to fifth columns are the characteristic parameters calculated from the original time series, including "Alpha", "Beta", "Theta", and "Delta" in turn, corresponding to the EEG signal in the "8- 13Hz", "14-25Hz", "4-7Hz" and "0.5-4Hz" frequency range of energy percentage, the characteristic parameter unit is percentage.
Based on the data sources and experimental data given in the above attachments, please study the following questions:
Question 1: In the brain-computer interface system, not only the classification accuracy of the target must be considered, but also a certain information transmission rate must be guaranteed. Please design or adopt a method based on the data given in Annex 1, using as few rounds as possible (requiring rounds to be less than or equal to 5) test data, and find out 10 of the 5 test sets in Annex 1. A target to be identified, and a specific classification and identification process is given, which can be compared with several methods to illustrate the rationality of the design method.
Problem 2: Because of the large amount of original EEG data collected, such signals are bound to contain more redundant information. According to Figure 5 and Table 1, in the 20 EEG signal acquisition channels, irrelevant or redundant channel data will not only increase the complexity of the system, but also affect the accuracy and performance of classification and recognition. Please analyze the data given in Annex 1 and design a channel selection algorithm to give a channel name combination that is more conducive to classification for each subject (the number of channel combinations is required to be less than 20 or greater than 10, and each subject selects The channels can be different, the specific channel names are shown in Figure 5 and Table 1). Based on the results of channel selection, further analyze a set of optimal channel name combinations that are more suitable for all subjects, and give the specific analysis process. In order to facilitate the contestants to choose the best channel combination, the results of the test data (char13-char17) are given in the contest questions. Their characters are: M, F, 5, 2, and I.
Question 3: In the P300 brain-computer interface system, it often takes a long time to obtain labeled samples to train the model. In order to reduce the training time, please select an appropriate amount of samples as labeled samples according to the data given in Annex 1, and the rest of the training samples as unlabeled samples. Design a learning method based on the set of optimal channel combinations obtained in the second question , And use the test data of question two (char13-char17) to verify the effectiveness of the method, and use the designed learning method to find the remaining targets to be identified in the test set (char18-char22).
Question 4: Based on the feature samples given in Annex 2, please design a sleep staging prediction model to obtain a relatively high prediction accuracy based on the few training samples as possible, and give the training data and test data The selection method and allocation ratio are used to explain the specific classification and recognition process, and the prediction effect is analyzed in combination with classification performance indicators.
Note: When researching question 1, the results of the 5 test data provided in question 2 cannot be used as known conditions. If this condition is used, it will be judged as cheating.

Guess you like

Origin blog.csdn.net/shanlijia/article/details/108640686