With an estimated overall sample

1. Overall, the individual sample

Generally during the statistical analysis, all the research object;
individual objects that each study population;
samples are all from individuals in the overall X according to certain rules drawn, with X 1 X 2 X n X_1,X_2,…,X_n It represents;
number of samples contained in sample size is referred to as an individual, with n n represents.

Like to study a class of average height:
this class all the students of the height is in general;
A student's height is an individual;
Height studied 20 students according to certain laws out of which 20 students of height is a sample;
20 is the sample size, i.e. n = 20.

2. How to estimate the population sample for?

(1) choose the correct sampling methods
because many times we can not choose all the survey data, this time using generally drawn from the whole sample survey.

Sampling method: simple random sampling, stratified sampling, cluster sampling, sampling systems and the like.
Here Insert Picture Description

(2) approximately equal to the average of Samples population mean (central limit theorem)

(3) using the estimated population standard deviation

3. The sample mean, population mean

1, different definitions

Sample mean refers to the mean of the sample data in the population. The overall population mean, also known as the mathematical expectation or simply expectations, describing the situation of random variable average digital features. Including the overall population mean mean discrete random variables and continuous random variables.

2, calculated on the basis of different

The sample mean is calculated on the basis of the number of samples, calculated on the basis of population mean is the number of population. Generally the number of samples less overall number.

3, representing different meaning

Sample mean represents the central tendency of the samples taken, while the population mean represents the central tendency of all individuals. Sample from the population, but only part of the overall sample, the two can not be completely equal, generally there are differences.

4. variance (variance)

Statistical description, the variance used to calculate the difference between each variable (observed value) and the overall mean. To avoid deviation from the mean zero sum, sum of squares of deviations from the mean affected sample size, using the average statistical degree of variation from the mean square difference and the variables described.

The overall variance calculation formula:
p 2 = ( X m ) 2 N \sigma^2=\frac{\sum(X-\mu)^2}{N}
p 2 \sigma^2 is the population variance, X X is a variable, m \ mu is the population mean, N N is the overall number of cases.

Question:
Why should each number and then subtract the average squared? Take the absolute value of the difference between them is not possible?
A:
For example, a set of data: 7.5,7.5,10,10,10
another set of data: 6,9,10,10,10
average of two sets of data are clearly 9
the absolute value of the difference between them and the average of all 6
however, the first set of variance data = 7.5, variance = 12 of the second set of data
is not equal to it, the variance in the data value to the expansion of the toggle, so that it is difficult to see from some of the other data to It shows up.

The standard deviation (Std Dev, Standard Deviation)

Standard deviation also referred to as the standard deviation, which is the square root of the variance, with p p represents.

p = ( X μ ) 2 N \sigma=\sqrt{\frac{\sum(X-\mu)^2}{N}}

Variance and standard deviation amount of a data set is a measure of the magnitude of the ripple, the greater the variance or standard deviation, the greater the fluctuation of the data.
So the question is, as with variable variance to describe the degree of deviation from the mean, and that they engage in out standard deviations do it?
The reason: the variance of the data we are dealing with the dimension is inconsistent, although that best describes the degree of deviation from the mean of the data, but the processing results are not consistent with our intuitive thinking.
For example: a class there are 60 students, the average score is 70 points, 9 standard deviation, variance is 81, the results follow a normal distribution, we can not directly determine the students in the class and in the end mean deviation divided by the number of variance probability, the standard deviation is very intuitive to give our student performance distribution in the range [61,79] is 0.6826, i.e., approximately equal to the 34.2% figure 2 *
Here Insert Picture Description

3σ criterion:
Here Insert Picture Description
In a normal distribution σ p represent the standard deviation, μ m represents the mean, x = μ x = μ axis of symmetry is the image
value distribution μ σ , μ + σ ) (M-p, m + p) probability is 0.6827
value distribution μ 2 σ , μ + 2 σ ) (M-2s, m + 2s) probability is 0.9545
value distribution μ 3 σ , μ + 3 σ ) (M-3F, M + 3c) probability of 0.9973
can be considered, Y Y value of almost all concentrated in μ 3 σ , μ + 3 σ ) (M-3F, M + 3c) interval, the possibility of outside this range accounts for less than 0.3%.

6. The sample variance, population variance

The overall variance has limited overall and infinite population, have their own real parameters, mean this is a real true value when calculating the total variance, is divided by N. The overall sample variance is in part to the randomly selected for estimating the overall (overall generally difficult to know), the sample can be many types of statistics.
Here Insert Picture Description

Question: Why is the sample variance divided by (n-1) instead of divided by n?
A: The sample variance reason for dividing the (n-1) is due to the variance estimator is not about the overall variance unbiased estimator of
1. unbiased estimate
unbiased estimate is the sample statistics used to estimate the population parameter unbiased estimation. Estimate mathematical expectation is equal to the true value of the parameter to be estimated, this estimate is said to be unbiased estimation of the estimated parameters, i.e., unbiased, it is a good criterion for property evaluation estimator. Significance unbiased estimator is: under repeated their average close to the true value of the estimated parameters.
2. biased estimate

Guess you like

Origin blog.csdn.net/YPP0229/article/details/94594306