eye tracking

I wrote two articles before. Today I will write a complete one.

There are several options

1. Electrode eye tracking: This technology measures eye movement by placing electrodes around the eyeball. It can provide very high accuracy and resolution, but requires eye contact, so it is not suitable for long-term use or application scenarios that require non-contact measurement.

2. Infrared eye tracking: This technology uses infrared cameras to observe the position and movement of the eyes. Since it does not need to touch the eyeball, it is very suitable for long-term use and application scenarios that require non-contact measurement. It is generally less accurate and resolution than electrode-based eye tracking.

3. Magnetic resonance eye tracking: This technique uses magnetic resonance imaging to measure eye position and movement. It can provide very high spatial resolution but low temporal resolution, making it less suitable for studying rapid eye movement processes.

4. Wearable eye tracking: This technology uses small sensors or cameras, which can be placed on glasses or helmets, and can be carried around, suitable for mobile application scenarios. But due to the size and weight constraints of wearable devices, their accuracy and resolution are usually low.

5. Retinal Tracking: This technology uses retinal images to track the position and movement of the eyeball. It can provide very high accuracy and resolution, but it can only be used under specific experimental conditions, such as observing a single point light source in a dark environment.

Based on the book I am currently reading, I still use a visual solution to achieve:

1. Feature extraction: Select appropriate features to describe the shape, color, texture and other information of the eyes.

For example, a Haar cascade detector can be used to extract eye contour features, or a color distribution model can be used to extract eyeball color features. This step is mainly traditional

2. Object detection: Use machine learning or computer vision techniques to detect the position and orientation of the eyes.

Because direct target detection is inaccurate, the reality is too complicated. Cascade classifiers or support vector machines (SVM) can be used to identify eye position and orientation, or convolutional neural networks (CNN) can be used to classify eye movement types.

3. Tracking and estimation: According to the detection results, use tracking and estimation algorithms to track the position and movement trajectory of the eyes. Start tracking based on the capture, and continue to capture. Use Kalman filter or particle filter to estimate eye position and velocity, or use optical flow algorithm to estimate eye trajectory.

4. Data analysis: According to the results of eye tracking, perform data analysis and visualization. Statistics such as fixation location, duration, and number of fixations can be calculated, or eye movement data can be visualized using heatmaps and trajectory plots. This is also a point to write this time.

Here we mainly directly give the area of ​​ROI to reduce computing power.

So easy, folks!

But this program is too simple, just looking for features, a bit silly. Change the library this time:

Dlib is a machine learning library written in C++ that provides algorithms for tasks such as face detection, key point detection, and pose estimation, including algorithms for eye tracking. Dlib also provides a Python interface, which can use Dlib's algorithm to implement eye tracking in Python.

dlib provides a method to map face image data to a 128-dimensional space vector. If two images come from the same person, the space vectors mapped by the two images will be very close, otherwise they will be far away. Therefore, it is possible to determine whether they are the same person by extracting pictures and mapping them to a 128-dimensional space vector and then measuring whether their Euclidean distance is small enough. I don't want people, I want eyes.

 

 python.exe -m pip install --upgrade pip

update it

 

 

Between 2.0~8.0mm

Pupil size refers to the diameter of a circular hole in the center of the iris. It is affected by factors such as light, age, race, refractive status, target distance, and emotions. The normal range is between 2.0 and 8.0mm. The pupils shrink under strong light and dilate under dark, which is a normal physiological response of the human body. Inconsistent pupil size or an abnormal response to light may be a sign of brain or eye disease.

 

In most cases, we want to detect in real time: 

 

 

This writing is rather stupid, but I will package it later

Now two boxes are popping up to output the image, hurry up and pinch it! Let's get him lined up side by side!

Use the cv2.hconcat() function in OpenCV to horizontally merge two video frames together, and use the cv2.imshow() function to display the merged video frame.

It's very simple. In fact, it becomes a merged video group arranged horizontally, but there is a problem with the processing flow. It should be processed separately first, and the results should be merged at the end. 

For engineering problems, the above code is still too slow, let me add a little multi-threading magic!

Design two threads to process the reading and merging of left and right eye video frames respectively:

 

 This is the start of the thread, and then a loop of non-stop merging

You can also add a log function, just write directly to the top

The current program is not pretentious at all, if you can add some text or something, that would be even better!

The video stream should be encapsulated into a class, and then it is also multi-threaded.

 

In the code, the putText function is used to add the frame rate information to the upper left corner of the video frame. Among them, cv2.FONT_HERSHEY_SIMPLEX specifies the font type, 1 specifies the font size, (255, 255, 255) specifies the font color, and 2 specifies the font line width.

On the one hand, the display is visualization, on the other hand, we need to save specific eye movement data for post-processing. A function can be added to the program to extract the coordinate information of the circle box and save them to a file.

eye_data is a list containing eye movement information, each element is a 2-tuple representing the coordinates of the eyes. In the loop, write each element to the file, separating each coordinate with a comma, and appending a newline at the end of each line.

Assuming that the radius of the circular frame is r, and the coordinates of the center of the circle are (x, y), then the circle function in OpenCV can be used to draw the circular frame. When drawing a circular box, save the center coordinates and radius information in a list at the same time:

Add the center coordinates and radius information to the eye_data list each time a circle is drawn. Finally, the information in eye_data can be saved to a text file: 

 

Now it is a more complete function

Let me encapsulate it again:

 Draw the eyeball circular frame on the video frame and return the coordinate information of the circular frame

 

 I put the complete code on Github.

We got the saved data and want to display them again. Assume that the eye movement data file is a text file, and each line contains two numbers, representing the coordinates of the left and right eyes respectively.

Let's implement another function!

When playing, click the mouse to capture the currently playing data and mark the time stamp on the picture.

The program should be written like this:

1. Read the eye movement data text file and store the data in a list.

2 Open the video file, and read the first frame.

3. Display the first frame image on the window.

4. Enter the loop and read each data in the eye movement data list in turn.

5. When the user presses the mouse, record the current timestamp, and draw a circle or other mark on the image to mark the current timestamp.

6. Display the marked image on the window.

It is not good to write the following functions separately, so they are written together here.

 

 

whaosoft aiot http://143ai.com

c:/Users/yunswj/AppData/Local/Programs/Python/Python310/python.exe -m pip install ipykernel -U --user --force-reinstall
pip install opencv-python
https://cmake.org/download/
import cv2import threading

class VideoStream:    def __init__(self, src=0, width=640, height=480, fps=30):        self.stream = cv2.VideoCapture(src)        self.stream.set(cv2.CAP_PROP_FRAME_WIDTH, width)        self.stream.set(cv2.CAP_PROP_FRAME_HEIGHT, height)        self.stream.set(cv2.CAP_PROP_FPS, fps)        self.width = int(self.stream.get(cv2.CAP_PROP_FRAME_WIDTH))        self.height = int(self.stream.get(cv2.CAP_PROP_FRAME_HEIGHT))        self.fps = int(self.stream.get(cv2.CAP_PROP_FPS))        self.status = False        self.frame = None
    def start(self):        if self.status:            return None        self.status = True        threading.Thread(target=self.update, args=()).start()
    def update(self):        while self.status:            _, self.frame = self.stream.read()
    def read(self):        return self.frame
    def stop(self):        self.status = False

def main():    # 创建两个VideoStream对象,用于捕获左右眼视频流    left_cam = VideoStream(0)    right_cam = VideoStream(1)
    # 开始捕获视频流    left_cam.start()    right_cam.start()
    # 创建OpenCV窗口用于显示视频流    cv2.namedWindow("Video Stream", cv2.WINDOW_NORMAL)
    while True:        # 读取左右眼视频流        left_frame = left_cam.read()        right_frame = right_cam.read()
        # 在视频流上添加帧率信息        left_fps_text = f"FPS: {left_cam.fps}"        right_fps_text = f"FPS: {right_cam.fps}"        cv2.putText(            left_frame,            left_fps_text,            (10, 30),            cv2.FONT_HERSHEY_SIMPLEX,            1,            (255, 255, 255),            2,        )        cv2.putText(            right_frame,            right_fps_text,            (10, 30),            cv2.FONT_HERSHEY_SIMPLEX,            1,            (255, 255, 255),            2,        )
        # 合并左右眼视频流并显示        merged_frame = cv2.hconcat([left_frame, right_frame])        cv2.imshow("Video Stream", merged_frame)
        # 按'q'键退出        if cv2.waitKey(1) & 0xFF == ord("q"):            break
    # 停止视频流捕获    left_cam.stop()    right_cam.stop()
    # 关闭OpenCV窗口    cv2.destroyAllWindows()

if __name__ == "__main__":    main()

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/130123573