Statistics_Jia Junping——Thinking Questions Chapter 9 Categorical Data Analysis

1. Briefly describe the structure and distribution of the contingency table.

Answer: A contingency table is a frequency distribution table that cross-classifies two or more variables.

The distribution of the contingency table can be seen from two aspects. One is the distribution of observations, also known as conditional distribution. Each specific observation is the conditional frequency; the other is the distribution of expected values.

2. Construct a contingency table using examples from a newspaper, a magazine, or around you, illustrating the relationship between the two categorical variables in this survey, and asking questions for testing.

Answer: Carry out quality inspections on the three performances A, B, and C of the learning machines provided by the three manufacturers A, B, and C, and want to know whether there is a relationship between the quality differences between the manufacturers and the performance of the learning machines. 450 defective learning machines were randomly checked and sorted into a 3×3 contingency table as shown in Table 9-1.
insert image description here

According to the data of spot check and inspection, it shows that the type of defective product has nothing to do with the manufacturer (that is, which factory) produces (that is, they are independent of each other).

Establish assumptions: H0: The type of defective product is independent from the manufacturer's production; H1: The type of defective product is not independent from the manufacturer's production.

The expected value of each group can be calculated, as shown in Table 9-2 (the values ​​in brackets in the table are expected values).
insert image description here

So χ 2 = ( 20 - 17 ) 2 / 17 + ( 40 - 33 ) 2 / 33 + . . . + ( 70 - 58 ) 2 / 58 = 9.821. χ 2 =(20-17)^2/17+(40-33)^2/33+...+(70-58)^2/58=9.821.x 2 ( 20 17 )2/17(4033)2/33...(7058)2/589.821

And the degree of freedom is equal to (R-1) (C-1) = (3-1) × (3-1) = 4, if the test is carried out at the significance level of 0.01, check the χ^2 distribution table and get χ 0.01 2 ( 4 ) = 13.277 χ_{0.01}^ 2(4) = 13.277h0.012( 4 ) = 13.277 . Since $χ 2=9.821<χ_{0.01}^ 2(4)=13.277, the null hypothesis H0 is accepted, that is, the type of defective product is independent of the manufacturer's production.

3. Explain the calculation of χ 2 χ 2Steps for the χ 2 statistic.

Answer: Calculate χ 2 χ^2h2 steps for statistics:

(1) Use the observed value f 0 f_0f0Subtract the expected value fe f_efe

(2)将( f 0 - f e f_0-f_e f0fe) square of the difference;

(3) Square the result ( f 0 - fe f_0 - f_ef0fe) 2 divided by fe f_efe

(4) Add up the results of step (3) to get:

χ 2 = ∑ ( f 0 − fe ) 2 fe \chi^2=\sum \frac{(f_0-f_e)^2}{f_e}h2=fe(f0fe)2

4. Briefly describe the respective characteristics of φ coefficient, c coefficient and V coefficient.

Answer: (1) φ correlation coefficient is the most commonly used correlation coefficient to describe the correlation degree of 2×2 contingency table data. Its calculation formula is:

φ = χ 2 / n \varphi=\sqrt{\chi ^2 /n}Phi=h2/n

In the formula,

χ 2 = ∑ ( f 0 − fe ) 2 fe \chi^2=\sum \frac{(f_0-f_e)^2}{f_e}h2=fe(f0fe)2

The obtained φ coefficient can be controlled in the range of 0~1.

(2) The contingency correlation coefficient is also called the contingency coefficient, or c coefficient for short, and it is mainly used in the case of a contingency table larger than 2×2. The formula for calculating the c coefficient is:

c = χ 2 χ 2 + n c=\sqrt{\frac{\chi^2}{\chi^2 +n}} c=h2+nh2

When the two variables in the contingency table are independent of each other, the coefficient c=0, but it cannot be greater than 1. The characteristic of the c coefficient is that its possible maximum value depends on the number of rows and columns of the contingency table, and it increases with the increase of R and C.

(3) Gramer proposed the V factor. The formula for calculating the V factor is:

V = χ 2 n × m i n [ ( R − 1 ) , ( C − 1 ) ] V=\sqrt{\frac{\chi^2}{n \times min[(R-1),(C-1)]}} V=n×my [( R1),(C1)]h2

When the two variables are independent of each other, V=0; when the two variables are completely correlated, V=1. So the value of V is between 0 and 1. If one dimension in the contingency table is 2, that is, min[(R-1),(C-1)]=1, then the value of V is equal to the value of φ.

5. Construct a contingency table of the following dimensions and give χ 2 χ^2h2 degrees of freedom to test.

a. 2 rows and 5 columns

b. 4 rows and 6 columns

c. 3 rows and 4 columns

Answer: A contingency table in row i and j, as shown in Table 9-3.
insert image description here

x 2 x^2h2 degrees of freedom of the test = (number of rows - 1) (number of columns - 1), so

a. When i=2, j=5, Table 9-3 is a contingency table with 2 rows and 5 columns, and the degrees of freedom of the χ 2 test=(2-1)×(5-1)=4;

b. When i=4, j=6, Table 9-3 is a contingency table with 4 rows and 6 columns, and the degree of freedom of the χ 2 test=(4-1)×(6-1)=15;

c. When i=3, j=4, Table 9-3 is a contingency table with 3 rows and 4 columns, and the degree of freedom of the χ 2 test=(3-1)×(4-1)=6.

Guess you like

Origin blog.csdn.net/J__aries/article/details/130857883
Recommended