Data Analysis Thinking (1)|Reliability and Validity Thinking

Reliability and Validity Thinking

1. Concept

Reliability and validity thinking is usually used to select more valuable indicators in data analysis.

Reliability: the degree of reliability of the indicator. Including consistency and stability. (Whether the caliber is consistent and whether it is volatile)

Validity: The validity of the indicator. Whether the generation of a data or indicator fits the thing it wants to measure, that is to say, the change of the indicator can represent the change of the thing.

Summary: Reliability can reflect the stability and concentration of data, and validity can reflect the accuracy of data.

2. Give an example

To measure the effectiveness of company A’s advertising, choose the number of clicks on the advertisement as an indicator. From the perspective of different delivery channels, the exposures corresponding to the same number of clicks on the advertisement may be different, which makes the consistency of the indicators change. Very poor; from another perspective, because the increase in exposure has brought about an increase in the number of clicks on the ad, does this prove that the effect of the ad has improved? This is obviously not right. At this time, the number of clicks on advertisements is obviously not a good indicator. We will choose the click-through rate of advertisements with better reliability and validity to measure.

3. The influence of reliability and validity

There are many such examples in data analysis. How will the reliability and validity affect our analysis results? Let’s analyze it through the following figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-ftMvzkit-1670229999334)(/Users/seven/Library/Application Support/typora-user-images/image-20221205161933373 .png)]

Now it is necessary to select a shooter (indicator) to participate in the competition (measurement of things), the relationship between the distribution of bullet holes (indicator data) and the bullseye (actual result) is shown in the figure above.

  • A: The bullet holes and the bullseye overlap very well, and the reliability and validity are very good.
  • B: The bullet holes are dense, but the distance from the bull's-eye is far away. At this time, the stability is very good, but the accuracy is not enough, that is, the reliability is high and the efficiency is low.
  • C: The bullet holes are sparse, but they are distributed near the bull's-eye. At this time, the stability is insufficient and the accuracy is not bad, that is, the reliability is low and the validity is high. (This type is not common, usually low reliability will lead to low validity)
  • D: The bullet holes are sparse and far away from the bull's-eye. At this time, the reliability and validity are very low.

Guess you like

Origin blog.csdn.net/qq_35164554/article/details/128189433