Z test|T test|Sample standard deviation S replaces the population standard deviation σ

The Z test is also called the standard normally distributed variable test for normal distribution. It is usually used when a large sample (sample size is greater than 30) and the population standard deviation is known, to compare whether the difference between the sample mean and the population mean is significant.

The basic idea of ​​the Z test is to calculate the difference between the sample mean and the reference value or another sample mean, then normalize it to the z score of the standard normal distribution, and finally use the z score to calculate the p value to determine whether the difference is significant. 

The Z-score (also called the standard score or standardized score) is a score that represents the position of a data point in a standard normal distribution. It tells us how much a data point differs from the mean in units of standard deviation

Z-test usually requires the following two important prerequisites:

  1. Large sample size : Usually, the sample size needs to be greater than 30 to ensure that the central limit theorem holds true, so that the sampling distribution of the sample mean approximates a normal distribution.

  2. Known population standard deviation : The Z test requires that we already know the population standard deviation. If we don't know the population standard deviation, we can't perform a Z-test. In a practical situation, if we do not know the population standard deviation, we can consider using the T test.

These two prerequisites are to ensure the reliability and accuracy of the Z test. If the sample size is small or the population standard deviation is unknown, the t-test may be a more appropriate choice because it has more relaxed requirements on sample size and population standard deviation.

In general, it is very important to choose an appropriate statistical test method based on the actual situation and the characteristics of the sample data.

When the sample standard deviation s can be used instead of the population standard deviation σ to perform the Z test, the following situations are usually involved:

  1. The population standard deviation is known :

    • When you already know the standard deviation σ of the population, you can directly use it to perform the Z test. In this case, you do not need to estimate the population parameters.
  2. Large sample size :

    • When the sample size is large enough, usually greater than 30, it is reasonable to use the sample standard deviation s instead of the population standard deviation σ for the Z test. This is because when the sample size is large enough, the sampling distribution of the sample mean will approach the normal distribution. According to the central limit theorem, we can use the sample standard deviation to estimate the population standard deviation.
  3. The population is normally distributed or the sample size is large enough :

    • If you know that the population is normally distributed, or even if the population is not normally distributed but the sample size is large enough, you can usually perform a Z test using the sample standard deviation. This is because according to the central limit theorem, the sampling distribution of the sample mean will be close to the normal distribution, allowing us to use the sample standard deviation to make statistical inferences.

It should be noted that in practical applications, whether the population standard deviation is known and the size of the sample size need to be reasonably selected based on the specific circumstances. If uncertainty exists, sensitivity analysis can also be performed to assess the impact of different assumptions.

In general, using the sample standard deviation s instead of the population standard deviation σ to perform the Z test is usually a reasonable assumption that can be made if the above conditions are met.

 

When the population standard deviation is unknown and the sample size is small (less than 30), we can use the sample standard deviation s instead of the population standard deviation σ for statistical inference. This usually occurs in hypothesis testing such as t-tests.

In this case, we use the t distribution rather than the standard normal distribution because the estimate of the sample standard deviation introduces additional uncertainty. The t-distribution provides more accurate results with smaller sample sizes.

So, in this case, we can perform hypothesis testing by calculating the t statistic, and then obtain the corresponding p value based on the t distribution table or using statistical software. This allows us to make statistical inferences, such as determining whether there is a significant difference between two averages.

import numpy as np
import statsmodels.stats.weightstats as sm

# 一个样本数据,样本容量大于30
group1 = [85, 88, 84, 82, 91, 95, 89, 90, 84, 87, 86, 82, 88, 89, 90, 85, 83, 87, 91, 92, 86, 87, 88, 89, 82, 85, 86, 87, 88, 84, 90]

# 假设的总体均值
population_mean = 85

# 执行单样本 Z 检验
z_statistic, p_value = sm.ztest(group1, value=population_mean)

# 显示结果
print(f"Z 统计量: {z_statistic}")
print(f"P 值: {p_value}")

if p_value < 0.05:
    print("在95%的置信水平下,样本均值与假设的总体均值存在显著差异")
else:
    print("在95%的置信水平下,没有足够的证据表明样本均值与假设的总体均值存在显著差异")
import numpy as np
import statsmodels.stats.weightstats as sm
# 样本数据
sample_data = [10, 12, 11, 9, 8, 10, 11, 12, 9, 10]

# 假设的总体均值
population_mean = 10

# 执行单样本 t 检验
t_statistic, p_value = sm.ttest_1samp(sample_data, population_mean)

# 显示结果
print(f"t 统计量: {t_statistic}")
print(f"P值: {p_value}")

if p_value < 0.05:
    print("在95%的置信水平下,样本均值与假设的总体均值存在显著差异")
else:
    print("在95%的置信水平下,没有足够的证据表明样本均值与假设的总体均值存在显著差异")

Guess you like

Origin blog.csdn.net/book_dw5189/article/details/132785337