Python statistics: independent sample t test ~ test of mean difference

The distinction between independent samples and paired samples:

Independent samples: Refers to the fact that the data sources of the two sets of samples do not interfere with each other, such as two different groups of people or two different sets of experimental data.

Paired samples: Refers to the fact that the data of two groups of samples appear in pairs, such as the experimental data of the same group of people at different times or under different conditions.

Choose the difference between the inspection methods:

Independent samples: z-test for large samples and t-test for small samples;

Paired samples: t-test was used for both.

The difference when calculating the p-value:

Independent sample t-test: the focus is on the difference between the means of the two groups of data;

Paired sample t-test: first calculate the difference of the data and then perform a single-sample t-test.

for example:

The test case of the mean difference of independent samples: In order to test whether there is a difference in the mathematics scores of boys and girls, 20 students were randomly selected in a certain study. Since the sample size is less than 30, it belongs to the hypothesis test of small samples, and the sample data of the two groups of men and women belong to independent samples, so the mean difference test of independent samples of small samples should choose t test.

Process sample data:

# 导入数据
students_data = {
'name':['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T'],
'gender':['male','male','female','male','female','male','male','female','female','female','male','female','male','male','female','male','female','male','female','male'],
'grades':[76,82,86,82,35,77,76,91,76,91,96,88,96,78,86,78,67,91,76,89]}
# 转换格式
students_test_data = pd.DataFrame(students_data,columns=['name','gender','grades']) 
# 筛选两组样本的值
male = students_test_data.query('gender == "male"')["grades"]
female = students_test_data.query('gender == "female"')["grades"]
# 转化为数组型
male = np.array(male)
female = np.array(female)
print(male)
print(female)

Detailed code:

The imported data uses a dictionary, and {} represents the dictionary format. Presented as a key-value pair, with the key and value separated by a colon: {"name": "Tom", "age": 18}.

DataFrame() can convert data to DataFrame format, and the columns parameter specifies the name of each column in the data frame.

query() means to select rows that meet certain conditions from the DataFrame, and 'gender == "male"' in parentheses means to filter rows whose gender value is male. query()[] means to filter the rows and columns, and filter the grades column corresponding to the row whose gender value is male.

Compute t-values ​​and p-values:

stats.ttest_ind(male, female, equal_var = False)

The calculated p-value of 0.34 is much greater than 0.05, indicating that there is no difference between the two groups of samples. The result was no difference in male and female performance.

Supongo que te gusta

Origin blog.csdn.net/Sukey666666/article/details/130353849
Recomendado
Clasificación