Hypothesis testing (Hypothesis Testing)

Definition of hypothesis testing

Hypothesis testing: first put forward a hypothesis of an overall index and then use the sample data to determine if the hypothesis is true. Logically, hypothesis testing using a reductio ad absurdum, that is, first proposed hypothesis, again proved this assumption can not be true by the appropriate basic statistical methods. (I say "basic" because the results from statistical random sample, the conclusion can not be absolute, so we can only be related to the judge, according to some on the basis of probability.)

 

Hypothesis testing is based on the idea of ​​a small probability that the small probability event unlikely ever to happen in a trial. That is when a certain standard assumptions possibility of less than identified, we reject this hypothesis, otherwise we would say there is not enough evidence to reject the hypothesis.

 

If the sample data reject the hypothesis, then we say that the hypothesis testing results are statistically significant. A test results are statistically "significant", meaning the difference between the sample and population is not due to sampling error or accidentally caused.

 

The term hypothesis testing

The null hypothesis (null hypothesis) : is the test wants to be gathering evidence against the hypothesis, also called the null hypothesis , usually referred to as H 0 .

For example: a null hypothesis test version of the original version or less average index of average index.

 

Alternative hypothesis (Alternative hypothesis) : is the test wants to gather evidence to support the hypothesis, usually referred to as H 1 or H A .

For example: alternative hypothesis test is an indicator of the mean is greater than a version of the original version of the mean index.

 

Two-tailed test (Test TWO-tailed) : If the alternative hypothesis no specific directionality, and containing the symbol "≠", such tests referred to two-tailed test.

For example: The null hypothesis is tested version of the index is equal to the mean average of the original version of the index, the alternative hypothesis is an indicator of the mean test version is not equal to the average index of the original version.

 

One-tailed test (Test One-tailed) : If the alternative hypothesis having a particular directivity, and containing the symbol ">" or "<", this test is called one-tailed test. One-tailed test is divided into the left tail (lower tail) and right trailing (upper tail).

For example: a null hypothesis test version index less mean the original version of the mean index, it is an indication of the alternative hypothesis that the mean of the test version of the index is greater than the original version of the mean.

 

Indicator (On Indicator) : used as a standard for comparison.

For example: index is the average user stays in a single day a page of time.

 

Test statistic (the Test statistic) : we are on the quantile of the probability density distribution. The quantile more trouble in the actual calculation, it requires the integration of data distribution density function is obtained.

For example: Z value, t value, F value, chi-square value.

 

Significance level (Level significance) : null hypothesis probability of false rejection threshold, i.e. the first type make maximum probability error, denoted by α.

For example: In the 5% significance level, the sample data reject the null hypothesis.

 

Confidence (confidence Level) : the probability of correct acceptance of the null hypothesis, i.e., 1-α.

For example: 95% confidence level is 95% certainty that the measured sample mean very close to the overall expectations.

 

Statistical test force (Power) : the probability of correctly reject the null hypothesis that the 1-β. When the test results is not enough evidence to reject the null hypothesis, people are more concerned about the statistical test force, the greater the statistical test force, less likely to make mistakes.

 

The critical value (Critical value) : value to be compared with the specific value of the test statistic.

 

Critical region (Critical Region) : reject the null hypothesis test statistic of the range, also called the rejection region (rejection region), a region set by a threshold value thereof. If the test statistic in the region refuse, then we reject the null hypothesis.

8f3a9a0d33181a91974fe7a609978709_b

 

Confidence interval (confidence interval The) : contains the population parameter random intervals. We used to estimate the overall sample, if only an estimate, then it is called a point estimate. But every time a random sample of the calculated results are not the same, so the point estimate is not necessarily accurate, this time with a range would be more accurate to estimate overall.

For example: 95% confidence level indicates a certainty of 95% confidence interval comprising generally believed parameters (assuming the sample 100 times, 95 times with a computed confidence interval comprising the population parameter).

 

P values (P-value) : null hypothesis when true sample observation of the obtained results or obtaining more extreme probabilities.

Left-tailed P value of x is less than the test statistic probability statistics sample C, i.e.: P = P (x <C )
a right-tailed P value of x is greater than the test statistic probability statistics sample C, i.e.: P = P (x> C)
two-tailed P value of the test statistic value C x sample statistics fall within the tail probability of the end region twice, that is: P = 2P (x> C ) ( when C when the distribution curve at the right end), or P = 2P (X <C) ( when C is located at the left end of the time profile). If X t distribution and normal distribution, the distribution curve is symmetrical about the longitudinal axis, so the value of P can be expressed as P = P {| X |> C}

 

Hypothesis testing two types of errors

Type I error (Error discarded true) : error the null hypothesis is true null hypothesis rejected. Type I error referred to as maximum probability α (alpha).

Type II error (false alarms take) : null hypothesis is false error accepted the null hypothesis. Type II error probability of the largest recorded as β (beta).

4dd051bba151b3273fb1711c31abb9d7_b

 

In hypothesis testing, we may make two types of errors in decision-making. In general, in the case of a sample to determine any decisions while avoiding the two types of errors can not occur, that is, while avoiding Type I error occurs, will increase the probability of Type II error occurs, or avoid second while the type of error occurs, it will increase the probability of type I error occurred.

 

In these two types of errors, people pay more attention to Type I error. Therefore, in most cases, people will control the probability of type I error occurs, the value of α should be as small as possible. During hypothesis test, it controls the first type the probability of error value by a previously significance level α is set, the α values ​​are used 0.01,0.05,0.1.

 

Step test hypotheses

1, defines the overall
2, the null hypothesis is determined and the alternative hypothesis
3. Select the test statistic (to determine the type of hypothesis testing)
4, select the level of significance
5, sampling Overall, the data of a certain
6, the test statistic is calculated based on the sample data of the specific value
7, test statistic constructed in accordance with the sampling distribution, determine the critical value and rejection region
8, comparing the value of the test statistic and critical value, if the value of the test statistic in the region refuse, reject the null hypothesis

 

Hypothesis testing decision criteria

Since the test method is the use of prior significance level of control given to the probability of making mistakes, so the data for two relatively similar hypothesis testing, we can not know which hypothesis is more likely to make mistakes, that we can only know by this method in accordance with this the first sub-sampling and commit the greatest probability of error, and can not know exactly how big a mistake on the probability level. P values ​​were calculated effective solution to this problem, in fact, a probability P value calculated in accordance with the sampling distribution value, which is calculated based on the test statistic. By direct comparison with the value P given significance level α size can know whether to reject the hypothesis, instead of a method which apparently specific size comparison with a threshold value of the test statistic. And this way, we can know the exact probability of committing an error in the first category where P is the number value is smaller than α. If P = 0.03 <α (0.05), then reject the hypothesis that the probability of this decision may be wrong is 0.03. It is noted that, if p> α, then it is assumed not to be rejected, in which case, a first type of error does not occur.

 

6,7,8 step it is assumed that the test can be changed: 6, calculate test statistic based on the sample data and the corresponding P value specific value; Comparative 7, a given level of significance α and P values, conclusions : If α> p value, the null hypothesis is rejected at significance level α.

 

Kind of hypothesis testing

Including: Z test, t test, chi-square test, F test.

 

Here are a look at these four hypothesis testing:

 

Z-test (Z test)

Z test was used to test samples and population mean or whether two different population mean are different. It needs to know in advance the overall variance, and the number of samples to be enough. Distribution of test statistic z values ​​follow a normal distribution.

 

Since the t-test at the same time applies to the small sample and a large sample (sample number 30 in the above can be considered large sample; otherwise it is small sample), so here skip Z test, t test focuses.

 

t-test (t test)

t-test is divided into single-sample t-test, paired t-test and independent samples t-test.

 

One-sample t-test ( One the Sample the Test-T ) : compared with the sample mean and population mean, to test the differences between the sample and the whole.

( A random sample mean, [mu] 0 is the overall mean value, s is the sample standard deviation, n is the number of observations in the sample, the degree of freedom n-1)

 

Paired t-test (Paired Sample T-Test) after 1, two homogenous subjects were treated in two different treatment:: The difference between the mean and the sample mean overall difference value by comparing the following situations test difference; 2, the same subjects after receiving two different treatment difference; 3, receiving the same subject before and after treatment difference.

If two homogenous subject, it is associated with the sample 2 is measured each time 1, are paired observations; if the same subject, it is this sample was measured twice, are paired observations value. Paired t-test is essentially the difference between the calculated first paired observations, after performing one-sample t-test.

 (D is the difference between each pair of data, d¯ sample mean difference , S D ¯ is the sample standard deviation of the mean difference, i.e. the difference between the standard error of the sample, S D is a difference value of a standard sample difference , for the number of n-paired observations of degrees of freedom n-1)

 

Independent samples t-test (the Samples work of the Independent the Test-T) : compared with the mean extracted from two different samples of whole, to test the differences between two populations. Which in turn are divided into equal variance and variance are not equal in both cases.

 

It is equal to the variance (or the pooled Equal Variance the Test-T ): the same number of samples of each set of data, two sets of data, or a variance less.

 

 

Unequal variance (Variance Unequal the Test-T ): the number of samples different from each data, and two sets of variance data is quite different. This hypothesis test is also known as Welch apos T-Test .

 

t-test of the premise: If the one-sample t-test, the sample must be taken from the overall normal distribution; if it is paired t test, it must be correlated between the two samples, and two samples were taken from the normal distribution overall; if independent samples t test, the two samples must be independent of each other, and two samples were taken from a population of normal, in addition to the need for determining the homogeneity of variance using the general two F test.

 

Chi-square test (chi-square test)

Chi-square test is divided into fitness test and test of independence.

 

Fitness test (the Test Goodness-of-Fit) : compared with the sample value observed in each category and the expected value, to verify the differences between the actual results with expected results. (I.e. --- test sample into single and two-sample differences between the sample and the overall inspection --- i.e. differences between the two overall)

 

Suitability test of H 0 is: observed frequency with the desired frequency of no difference.

The establishment of fourfold table, table to fill in the corresponding observed frequency and expected frequency.

计算χ2值:(O代表观察频数,E代表期望频数)。如果统计量(χ2)的值很小,说明观察频数和期望频数之间的差别不显著,统计量越大,差别越显著。

根据χ2分布及自由度可以确定在H0假设成立的情况下获得当前统计量及更极端情况的概率P。如果P值很小,说明观察值与理论值偏离程度太大,应当拒绝原假设;否则不能拒绝原假设。

 

独立性检验(Independence Test):用样本中两个类别的观察值与期望值进行比较,来检验样本中两个类别变量之间是否相互独立。

 

适合度检验H0是:两个类别变量之间没有关联。

建立列联表,一个变量作为行,另一个变量作为列。例如:

 
207 282
231 242

(表里填写的是分别喜欢猫或狗的男女人数,用于检验男女性别和喜欢的动物之间是否有关联)

计算出期望频数。

计算χ2值:,df=(行数 − 1)*(列数 − 1)

根据χ2分布及自由度可以确定在H0假设成立的情况下获得当前统计量及更极端情况的概率P。如果P值很小,说明两个类别变量之间有关联,应当拒绝原假设。

 

卡方检验的前提:卡方检验属于非参数检验,不存在具体参数,且不需要有总体服从正态分布的假设。

 

F检验(F test)

F检验分为方差齐性检验和方差分析。

 

Homogeneity of variance test (for the Test-F. Equality of Variances) : with extracted from two different samples of the overall variance by comparing two test whether the same overall variance.

F = s^2_1 / s^2_2

( S 2  is the sample s^2 = \sum(x - \overline{x})^2 / (n-1)variance: )

If the two samples from the overall variance of about the same size, then the F value will be close to 1; the contrary, if the F value is very large, it shows that the two populations are quite different.

 

Homogeneity of variance test of the premise: two samples are taken from the normal distribution overall (Note: Because the F test is very sensitive to the normality of the data, so the homogeneity of variance test of time, robust Levene test is better than F test .Levene test can also be used to compare multiple sample variance.)

 

Analysis of variance (the Analysis of Variance, ANOVA) : compared with the mean (each about the size of the overall variance) extracted from two or more different populations of samples, to test the differences between a plurality of generally. Analysis which in turn are divided into one-way ANOVA and multivariate analysis of variance.

 

Here mainly talk about ANOVA: mean square error (mean square error between groups) samples were divided by internal variance (within group mean square error) between a plurality of samples is about. (Where is the overall average, , k is the number of samples, N is the total number of observations of k samples)

 

Analysis of variance premise: The same test prerequisite pooled test, i.e., generally need to meet the normality and homogeneity of variance and independent samples t.

 

Note: If the overall variance missing, you can use Welch's ANOVA, specifically, see: http://www.real-statistics.com/one-way-analysis-of-variance-anova/welchs-procedure/ .

 

Guess you like

Origin www.cnblogs.com/HuZihu/p/9692828.html