problem definition
anomaly,outlier, novelty, exceptions
Different approaches use different terms to define this type of problem.
application
Two categories
If there is only normal data, and the range of abnormal data is very wide (not exhaustive), binary classification is not easy to do. In addition, abnormal data is not easy to collect.
Classification
Each picture is labeled, and you can train a member classifier of the Simpson family.
Anomaly detection based on classifer.
Do abnormal questions based on confidence scores. If it is greater than a certain value, it is normal, and if it is less than a certain value, it is abnormal. The maximum score will be misjudged
as part of the confidence data.
Confidence Score Estimation
Directly teach the network confidence score, not only do the classification task C, but also give the confidence score P
Train and Eval
100 pictures of the Simpsons, 5 anomalous pictures
- Normal graphs with blue color are misclassified as abnormal
- Abnormal maps with red color are misclassified as normal
At this time, use the dev set to evaluate the system, which is a binary classification problem.
The distribution of normal and abnormal ratios is very different. This system can have a high accuracy rate, but it does nothing. It is meaningless to use ACC accuracy rate to classify.
Use the confusion matrix:
cost table, the cost of doing wrong behavior, calculate a score:
set the cost table for your own tasks. There are also some methods to measure, such as AUC (area of ROC curve).
question
If the face is yellow, then the system will give a higher score, which means that what the classification system learns is not to recognize people, but whether the face is yellow.
Suppose you can receive some abnormal data, you can learn to classify and give abnormal scores at the same time, but this kind of data is not easy to collect. Consider using GAN to generate anomalous data.
Scenes without labels
Normal players and abnormal players (Xiaobai)
problem definition
A numerical method is needed to give each player a score. f (sta) f(sta)f ( sta ) probability density estimate
Gaussian distribution