Machine Learning Metrics: F1 Score

Make a fortune with your little hand, give it a thumbs up!

Introduction to F1 score

In this article [1] , you will learn about F1 scores. F1 score is a machine learning metric that can be used for classification models. Although many metrics exist for classification models, through this article you'll learn how the F1 score is calculated and when it has added value.

The f1 score is a suggestion for improvement over two simpler performance metrics. So, before diving into the details of the F1 score, let's give an overview of those metrics behind the F1 score.

Accuracy

Accuracy is a metric for classification models that measures the number of correct predictions as a percentage of the total number of predictions made. For example, if your predictions are correct 90% of the time, then you are 90% accurate.

alt

Accuracy is only a useful metric if the classes in your classification are evenly distributed. This means that if you have a use case where you observe more data points for one class than another, accuracy is no longer a useful metric. Let's look at an example to illustrate this:

Imbalanced Data Example

Let's say you're working with sales data for your website. You know that 99% of website visitors don't buy and only 1% of visitors buy. You are building a classification model to predict which website visitors are buyers and which are just browsers.

Now imagine a model that doesn't perform well. It predicts that 100% of visitors are just viewers and 0% are buyers. This is clearly a very wrong and useless model.

Accuracy is not a good metric when you have class imbalance.

What happens if we use the precision formula on this model? Your model predicts only 1% wrong: all buyers are misclassified as spectators. Therefore, the percentage of correct predictions is 99%. The problem here is that 99% accuracy sounds good, while your model performs poorly. In summary: accuracy is not a good measure when you have class imbalance.

  • Address imbalanced data by resampling

One way to deal with the class imbalance problem is to work on your samples. Using a specific sampling method, you can resample a dataset in such a way that the data is no longer imbalanced. Then you can use accuracy as a metric again.

  • Address imbalanced data with metrics

Another way to address the class imbalance problem is to use a better accuracy metric, such as the F1 score, which takes into account not only the number of model prediction errors, but also the types of errors made.

Basis of F1 Score

Precision and Recall are the two most common metrics that take class imbalance into account. They are also the basis of F1 results! Let's get a better look at Precision and Recall before combining them into F1 scores in the next section.

Precision

Accuracy is the first part of the F1 score. It can also be used as a standalone machine learning metric. Its formula looks like this:

alt

You can interpret this formula as follows. Among everything predicted to be positive, precision calculates the percent correct:

  • An imprecise model may find many positives, but its selection method is noisy: it will also falsely detect many positives that are not actually positives.
  • An accurate model is very "pure": maybe it doesn't find all positives, but the ones the model classifies as positive are likely correct.

Recall

Recall is the second component of the F1 score, although recall can also be used as a separate machine learning metric. The recall formula is as follows:

alt

You can interpret this formula as follows. Of all the actually positive things, how much the model managed to find:

  • Models with high recall are good at finding all positive examples in the data, even though they may also incorrectly identify some negative examples as positive.
  • A model with low recall fails to find all (or most) of the positive cases in the data.

Reference

[1]

Source: https://towardsdatascience.com/the-f1-score-bec2bbc38aa6

This article is published by mdnice multi-platform

Guess you like

Origin blog.csdn.net/swindler_ice/article/details/130674766