Secret Terrorism Warning: Over 20 million, the world’s largest human eye image dataset is open source

Covering factors such as 2D and 3D feature points, semantic segmentation, 3D eyeball annotations, gaze vectors and eye movement types, researchers at the University of Tübingen in Germany have created the world's largest public data set of human eye images-TEyeD.

Almost Human reports, Author: Turre

In today's world, image-based eye tracking is becoming more and more important, because human eye movement has the potential to change the way we interact with surrounding computer systems. In addition, the way of eye movement can recognize or even predict our actions and intentions to some extent, so eye movement analysis can empower new applications, especially when combined with modern display technologies such as VR or AR. For example, the gaze signal and the possibility of human-computer interaction allow the disabled to interact with the environment with the help of special equipment designed for their illness. In the application scenario of the surgical microscope, the surgeon must perform a variety of control actions, and then the visual signal can be used for autofocus. The human eye-gazing behavior can also be used to diagnose schizophrenia, autism, Alzheimer's disease, glaucoma and other diseases. In VR or AR games, the human eye gaze signal can be used to reduce the calculation of rendering resources.

In addition to the human eye gazing at information, the observation of the human eye can also bring more sources of information. For example, the frequency of human eye closure can be used to measure the degree of human fatigue, which is an effective safety feature in car driving and aviation flight scenes. Another important source of information is pupil size, which can be used as a basis for estimating people's cognitive load in a given task, and then adjusting the content (such as media-based learning) to better adapt to the person's mental state. Finally, with the help of iris characteristics and individual eye-gazing behavior, eye-related information can be used in the biometric identification process.

Recently, researchers from the University of Tübingen in Germany created the world's largest and unified human eye image public dataset TEyeD. These images are all captured by head-mounted devices . Specifically, seven different head-mounted eye trackers were used in the creation of TEyeD , two of which were also combined with VR or AR devices. The images in TEyeD are obtained in different mission scenarios, including riding in a car, simulated flying, outdoor sports, and daily indoor activities.

In addition, the human eye images in the dataset include 2D and 3D feature points, semantic segmentation, 3D eyeball annotations, gaze vector (GV) and eye movement types. Feature points and semantic segmentation are provided for pupils, iris and eyelids, and the length of the video ranges from a few minutes to a few hours. The TEyeD data set has  more than 20 million well-annotated human eye images, which provides a unique and consistent resource and a good foundation for promoting research in the fields of computer vision, eye tracking, and gaze estimation in modern VR and AR applications.

 

 

Paper address: https://arxiv.org/pdf/2102.02115.pdf

Comparison with existing data sets

Table 1 below lists the existing data sets containing close-up images of human eyes . Each data set deals with specific problems, such as the Casia and Ubiris data sets that use iris to identify individuals. In NNVEC, the direct estimation of the optical vector and eye position can compensate for the displacement of the head-mounted eye tracker.

 

 

TEyeD combines and expands previously released data sets by using  7 eye trackers with different resolutions , merges all available annotations provided by existing data sets, and expands these data sets with 3D segmentation and feature points. More specifically, TEyeD integrated data sets include NNGaze, LPW, GIW, ElSe, ExCuSe, and PNET. In addition, the complete data from the study [69] has also been carefully annotated.

TEyeD contains a total of  more than 20 million images and is the world's largest image data set captured with a head-mounted eye tracker .

Data set details

Figure 1 below shows an example image from the TEyeD data set. Specifically, the 1st and 5th columns contain the input image; the 2nd and 6th columns show the overlaid segmentation of the sclera, iris and pupil; the 3rd and 7th columns show the input image Feature points, where red is the eyelid, green is the iris, and white is the pupil; the 4th and 8th columns show the calculated eyeball, eyeball center and gaze vector.

 

 

Figure 2 below shows the logarithmic distribution of the characteristic points of the pupil (left), iris (middle) and eyelid (right):

 

 

Figure 3 below shows the box plot of the pupil, iris, and sclera distribution (left), and the logarithmic distribution of the gaze vector (right):

 

 

Figure 4 below shows the distribution of eyeball positions (x,y) and the box plot of the eyeball radius (in pixels) mapped to a fixed resolution of 192×144:

 

 

Annotation process

For feature point annotation and semantic segmentation in the TEyeD dataset, the researchers used both semi-supervised methods and multi-annotation maturation (MAM) algorithms. Unlike the original algorithm, they did not use SVM, but combined Convolutional Neural Network (CNN) with HOG features. In addition, the researchers limited the number of iterations to 5 and used two competing models. One of the models includes ResNet50 and uses the verification loss function in [36] for feature point regression training; for the other model, they train semantic segmentation with U-Net and residual blocks.

Initially, the researchers annotated 20,000 images with feature points and transformed them into semantic segmentation. Then, they trained the CNN and used the MAM algorithm to continuously improve. After 5 iterations, ResNet50 feature points were transformed into semantic segmentation and compared with U-Net results.

Specifically, the researchers annotated the 3D eyeball and optical vector based on the method in [30]. However, instead of using an oval pupil, they used an oval iris because the latter is only partially affected by the corneal refraction.

By combining 2D feature points, segmentation, and 3D eyeball model, the researchers performed geometric calculations on 3D feature points and segmentation. Since the pupil is always located in the center of the iris, they considered two different 3D segmentation and 3D feature points.

Eye movement annotations are divided into fixation (eyeball is still), saccade (rapid eye movement between fixations), smooth following (slow eye movement), and blinking.

Benchmarking

In the experiment, the researcher divides the data into a training set and a validation set. In order to avoid the same subjects in the training and validation sets, they assigned the entire record to one of the training set and the validation set.

For the evaluation environment, the researchers used the C++-based CuDNN framework for the neural network model. The test environment hardware includes a 4-core Intel i5-4570 CPU with 16GB DDR4 memory and an NVIDIA 1050ti with 4 GB memory .

Table 3 below shows the results of feature point regression. The results show that, as expected, larger models are more effective on regression tasks.

 

 

Table 4 below draws the same conclusion, which shows the results of eyeball parameter estimation:

 

 

As shown in Tables 3 and 4 above, we can see that the TEyeD data set has obvious advantages compared with the existing smaller data sets. These results also show that, as expected, cross-eye-tracker generalization of images taken in real-world scenes is a challenging task, but by combining TEyeD with a more complex architecture Can handle this task. Therefore, whenever you use a new eye tracking device, you can easily solve the generalization task of cross-eye tracking, and there is no need to create and annotate new data.

Figure 5 below shows the result of semantic segmentation:

 

 

Table 6 below shows the results of eye movement recognition. It can be seen that the gaze vector is more effective in eye movement classification because it compensates for the displacement of the eye tracker.

 

Guess you like

Origin blog.csdn.net/weixin_42137700/article/details/113836466