Statistical basis of parameter estimation

table of Contents:

First, the point estimate

  1, Monent

  2, order statistics law

  3, the maximum likelihood method

  4, the least squares method

Second, the interval estimation

  1, a range estimate population parameters:

  • The overall mean interval estimation
  • The overall proportion of interval estimation
  • Interval estimation of population variance    

  2, interval estimates of parameters of two populations:

  • Interval for the difference of two population means estimated
  • Interval for the difference in the two estimates of the overall proportion
  • The overall variance ratio of two interval estimation  

Third, the sample size is determined

  1, is estimated to determine the amount of time the sample population mean

  2, the estimated amount of time to determine the sample population proportion


 First, the point estimate

Point estimate is the amount of sample statistics to estimate the parameter, as a point of several sample statistic value axis, the estimated result also indicates the value of a point, so called point estimation.

Point and interval estimation belong to the population parameter estimation problem . What is the overall statistical parameters, when a set of data obtained from the sample in the study, how to estimate the overall characteristics of the information by this group, that is, how to infer the overall situation from the local results, called the overall parameter estimation .

By the sample data to estimate the overall distribution of the true value contained in the unknown parameters, the value obtained is called the estimated value . Point estimate of how accurately expressed confidence interval.
When the nature of the female population is not clear, we have to use a certain number of estimates as to help understand the nature of the number of the mother. Such as: parent group but the average of the sample mean estimates of μ. When we only use a specific value, i.e., a dot line number, as an estimated value to estimate the number of the parent, it is called the point estimate .
Object point estimate is based on the sample X = (X1, X2 ... Xi) estimate the overall distribution of the unknown parameter [theta] or [theta] contained in the function g (θ). [Theta] or general g (θ) is a general characteristic values, such as the mathematical expectation, variance, correlation coefficient.

1, Monent

Using sample moments to estimate the corresponding parameters in general. Derivation relates generally rectangular first parameter of interest (the expected value of the random variable power, i.e. under consideration) equation. A sample was then removed and the sample is estimated from the overall moment. Then using the sample moments substituted (unknown) overall moment solved parameters of interest. In order to obtain an estimate of those parameters.

The easiest method is to use the moment to estimate a sample order origin moment to estimate the overall sample of expectations with second-order central moment to estimate the population variance. When looking for the parameter estimator moment method, the origin of the overall distribution of moments does not exist, such as the Cauchy distribution can not be used, on the other hand it involves only the general features of some figures, the overall distribution is not used, so Moment Estimation Method the amount actually only focus on the general part of the information, so that it reflects the overall nature of the distribution is often poor, and only in the sample size n is large, in order to protect its superiority, and therefore in theory, moment method is based on estimates large sample application objects.

If there are K unknown parameters generally can be estimated before the appropriate moment before use generally k-th order K sample moments, and then use the unknown parameters with the overall function of the moment, we obtain the parameter estimator.

2, order statistics law

Order statistics provided a sample of the population of X, they are brought to large arranged , this arrangement is referred to as a statistic sample order . One sample , a set of observations brought to large

Correspondingly, wherein
Observation value is smallest,
It is the largest of observations. For example , the sample value 3.15,2.98,3.16,3.05,2.90, the order statistic of 2.90,2.98,3.05,3.15,3.16  
Order statistics to estimate the order statistics to estimate the method is simple and intuitive method of estimation, often the overall mathematical expectation and standard deviation were.
Set for the overall sample order statistic X, and called for the sample median. The median sample
Observations
The value rule is: the sample observations
Arranged in order from small to large amount of statistical observations
When n is odd (i.e., n = 2k + 1), the
Data taken centrally
; When n is an even number (n = 2k),
Take the average of two data center
, which is
Seen from the number of bits of meaning, it brings the average overall information value of X, therefore, for an estimated total mathematical expectation of X is suitable in. The number of bits used in the sample
X overall estimate of mathematical expectation of the method, called the mathematical expectation E (X) of order statistics to estimate the law . As a result, there are points to estimate the amount and the estimated value.

3, the maximum likelihood method

  Given a probability distribution D, assuming that the probability density function (continuous distribution) or probability aggregate function (discrete distribution) is F D , and a distribution parameter [theta], we can be extracted from this profile having n samples X1 values of X2, ..., Xn, by using the f D , we can calculate the probability: . However, we may not know the value of θ, even though we know that these sampling data from the distribution D. How to estimate θ? A natural idea is extracted from this profile having n sampling values of X1, X2, ..., Xn, and then use these sample data to estimate θ. Find an estimate of about θ. Maximum likelihood estimation will look for the most likely value of about θ (ie, the value of θ at all possible, look for a maximization of the value of the "possibility" of samples). This approach with just some other estimation methods, such as non-biased estimate θ, the output may not be non-biased estimate a most likely value, but will neither overestimated nor underestimated outputs a value of θ. To achieve the maximum likelihood estimation method mathematically defined possibilities:
And all values of θ, so that this function is maximized. The possibility that the maximum value to be known θ of the maximum likelihood estimate

4, the least squares method

 

 

 Observations is our multiple sets of samples, the theoretical value is our hypothesis fitting function. The objective function is often said in machine learning loss function, our goal is to get the target function is minimized when the fit function model. For a simple example of simple linear regression, such as we have only one of the m sample feature:

 

 

 

 Fitting function using the following samples: . We will have a sample wherein x, corresponding to the fitting function has two parameters [theta] 0 and [theta] . 1 is necessary to obtain.

The objective function is:

 

 

 Do it by the least squares method, so that J ( [theta] 0 , [theta] . 1 ) minimized, so that obtains J ( [theta] 0 , [theta] . 1 ) [theta] at the minimum 0 and [theta] . 1So fitting function would come to.

Reference: https: //www.cnblogs.com/pinard/p/5976811.html


Second, the interval estimation

Interval estimation is based on point estimates, given a population parameter estimates range interval, this interval is usually plus or minus a sample obtained estimation error statistic. When different point estimates, interval estimation, how close the general parameters can be given a measure of the probability sample statistics based on the sampling distribution of sample statistics

1 , an interval estimate population parameters: Transfer: https://blog.csdn.net/liangzuojiayi/article/details/78043658

  • The overall mean interval estimation
  •  

     

  • The overall proportion of interval estimation
  •  

     

  • Interval estimation of population variance
  •  

     

     

     

        

2, interval estimates of parameters of two populations : Transfer: https: //blog.csdn.net/liangzuojiayi/article/details/78044718

  • Interval for the difference of two population means estimated
  • Large sample
  •  

     Small sample

  •  

     

     

     

  • Interval for the difference in the two estimates of the overall proportion
  •  

     

  • The overall variance ratio of two interval estimation
  •  

     

     

     


OK Third, the sample size : Transfer: https: //blog.csdn.net/rosa_zz/article/details/79562794

• sample size:

The number of the number of samples in the sample population composed of individuals or units.

• the necessary sample size:

Also known as the necessary sample number of units, is the number of sample units required for the purpose of investigating the case, at least choose to meet.

1, is estimated to determine the amount of time the sample population mean

1. Repeat sampling

Once the confidence level (1-α), the value Zα / 2 on the determined, for a given value and the overall standard deviation [sigma], the sample can be determined to any desired capacity of a required allowable error. E represents order to achieve the desired tolerances, that is:

 

Whereby a determination can be pushed into the sample volume the following formula:

 

2. Do not repeat sampling

 

• n is proportional to the sample size and population variance,
• inversely proportional to the absolute error,
• proportional to the degree of probability.

Example: standard deviation with an MBA graduate salary is about 4,000 yuan, suppose you want an estimated 95% confidence interval of annual salary, want to allow error is 10,000 yuan, how much should be drawn sample size?

2, the estimated amount of time to determine the sample population proportion

1. Repeat sampling

Once the confidence level (1-α), the value Zα / 2 is determined. Since the value of the overall ratio is fixed, the tolerance is determined by the sample size, the smaller the sample size to allow larger errors. Accuracy of the estimation, the better. Thus, for a given value of π, it can determine a desired allows any error sample size needed. E represents order to achieve the desired tolerances, that is:

 

It can be deduced repeated sampling and infinite population Sampling Sample Size Determination following formula:

 

2. Do not repeat sampling

 

• d values ​​generally less than 0.1
• π unknown, to replace the sample proportion p
When p • π or are unknown, it is desirable 0.5, this is a cautious estimate

Example: A sample survey of the community want to know the ratio of residents to participate in sports activities, if the error range is set at 5%, parameter estimation and asked if the 95% confidence level, how much of the sample?

 

 Notes determine sample size

First, the use will not be repeated sampling in practice, but the formula used in place of resampling;

Second, if p and unknown, which process is:

        1. Recent data instead of the past,

        2. Instead of using sample data,

        3. Take p = 0.5 or the value closest to 0.5;

Third, the same population, if obtained Nx, Np range, then take a larger sample size as necessary,

        In order to satisfy the need to do two kinds of investigation;

Fourth, in practice, often using simple random sampling formulas for resampling.

Guess you like

Origin www.cnblogs.com/zym-yc/p/11484974.html