Sampling (Sampling Method)

Probability sampling:

1. random sampling (Random sampling) : from a finite population simple random sampling or random sampling from an infinite population.

Specific implementation: a drawing lots; B random number method.

 

2. stratified sampling (of a - stratified sampling) : The overall unit according to certain characteristics, or some rules are divided into different layers (Strata), then the amount of random sampling units from each layer, the composition of the sample. If the individual layers are homogeneous, then a good estimate can be obtained at a layer wherein a relatively small sample size.

 

3.  Cluster sampling (sampling Cluster) : The overall divided into several groups, when direct sampling randomly selected group, all these groups is the sampling unit samples. In an ideal world, every group is representative of the entire population within a small range.

 

4. systematic sampling (Systematic sampling) : All the overall sampling units according to a certain order, is equally divided into n parts, a first random sampling units in the first portion, and equidistant from each other in the extracted portion a sampling unit, consisting of samples.

 

Note: random sampling and can be in two ways: without replacement sampling (sampling the without Replacement) , with replacement sampling (sampling with Replacement) .

Note: probability sampling method that is selected from a population of individuals with a known probability sample was selected.

 

Non-probability sampling methods:

Convenient sampling (convenience sampling): - with a number of sampling units in the overall ease of acquisition as a sample.

 

Judgmental sampling (judgement sampling): the study of subjective well aware of the general population he thinks determine the most representative sample of individuals.

 

Sampling steps:

1. Determine your target population (target population)

2. Determine the sampling unit (sampling units)

3. Determine the sampling frame (sampling frame): How to mark each sampling unit

 

Sampling bias (Sampling Bias): the probability of each individual to be able to get is not the same, there is bias.

 

Example:

1948 US presidential election, Democratic candidate is Truman, the Republican Party is Dewey. A newspaper conducted a telephone poll, the sample estimate who would win. After a lot of phone statistics show that the number of votes cast for Dewey than vote for Truman votes, therefore the newspaper just before the election results are not published, confidently delivered a "Dewey Defeats Truman" newspaper front page, Dewey believed certainly won.

However, in fact, winning is Truman! Cause of this reversal, not because of editing mistake, not bad luck, but because the phone is very expensive, so the sample are rich, and then just the rich Dewey ticket bunker. That sample selection bias in the rich side, does not have a broad representation that has caused more support for Dewey illusion.

 

For machine learning, if data sampling is biased, the results obtained by learning is biased. Therefore, to understand the test environment, thereby allowing the training environment and test environment as close as possible.

 

Guess you like

Origin www.cnblogs.com/HuZihu/p/11145521.html