Inferential statistical analysis - how to avoid bias and sampling distribution

What is the population and sample?

Random number random module

# Import random (random number) module 
Import Random
 '' '
Module using the randint random () function generates a random number
The syntax is: random.randint (a, b)
Function returns the number N,
N is a number between a to b (a <= N <= b), comprising a and b
The following case is to generate a random number between 0 to 9,
You are returned after each execution of different numbers (0-9)
'''
a=random.randint(0,9)
print(a)

Case: lottery

range () function creates a list of integers, it is commonly used in for loop.

Using the syntax:

range(start, stop[, step])

Parameter Description:

start: start counting from the beginning. The default is zero. For example Range ( . 5) is equivalent to Range (0,. 5 )

start: start counting from the beginning, but not stop. For example: Range (0, 5) is [0, 1, 2, 3, 4 ] No 5

step: the step size, the default is 1. For example: Range (0, . 5) is equivalent to the range (0, 5, 1)
'''
Draw: generating a plurality of random numbers
Applications: from 395 users in 10 randomly selected individuals as winners
'''
for i in range(10):
    the userId = the random.randint (0,395 )
     # a% s format string 
    Print ( ' % s of winners user id is% s ' % (I, the userId))

pandas data block (DataFrame) sampling method

'''
#arange generating a one-dimensional array comprising four elements 5 *
reshape: the array into two-dimensional array of four rows 5
'''
df = pd.DataFrame(np.arange(5 * 4).reshape((5, 4)))
df
# Randomly selected sub-set of a n-th row 
Sample1 = df.sample (n =. 3 )
sample1

What is the central limit theorem?

Central limit theorem of probability theory refers to the sequence of random variables discussed in section and distribution of a class of theorems on asymptotic normal distribution. 

This set of theorems is the theoretical basis of mathematical statistics and error analysis, pointing out the large number of conditions approximate normal distribution of random variables.

1, is approximately equal to the average value of the sample population mean

2, no matter what the distribution of any of the overall sample average Zhu will surround the overall average, and normal distribution

Third, assess the overall sample with

Generally less than the number of samples, it is possible to exclude extreme values ​​(the difference between the standard sample <population standard deviation)

 

 

 Objective: To sample standard deviation is used to estimate the population standard deviation

偏见是如何产生的?如何避免偏见?

1、样本偏差:很少的数据得出结论,以偏概全

2、幸存者偏差:通常只关注显而易见的样本,忽视没有机会出现的样本

3、概率偏见:自以为位置的概率,心里概率与实际的概率的偏差

4、信息茧房:个性化推荐

信息茧房其实是现在社会一个很可怕的现象,从字面意思来看的话其实比喻的是信息被虫茧一般封锁住。

这个问题反映了现在随着个性化推荐的普及衍射的一个社会问题。

1

Guess you like

Origin www.cnblogs.com/foremostxl/p/12112967.html