Descriptive statistical processing of data

Descriptive statistics data

The following data are set as a processing object nums, wherein a plurality of the same type of data comprising

Central tendency of the data:

  • 众数

Wherein: most often occur in the data set (not unique)

(Note: Because the average scipy.stats.mode only returns a number which first appeared in public, so I was expanded the following rewrite more in line with the definition of the mode of)

#!/usr/bin/python
# coding:utf-8
from scipy import stats
from collections import Counter

nums=[3,0,1,2,0,1,2,0,1,2]
datas=dict(Counter(nums))               # 获得按频率分配后的统计字典
keys=[]                                 # 统计满足是众数的keys集合
sum=0                                   # 统计符合value与众数的value相同的个数

count = stats.mode(nums)[1]

for key,value in datas.items():
    if value == count:
        keys.append(key)
        sum+=1

if sum==len(datas):                     # 若整个数据集各个元素出现的频率相同,则不存在众数;反之返回符合的众数集合(可能不止一个)
    print "该数据集中无众数"
else:
    print "该数据集的众数为:",keys

  • Median

Wherein: the number of values ​​included in the data set, if it is odd, for the minimum number of rows in the middle; if it is an even number, for the two rows in the middle of the arithmetic mean of the number of

import numpy as np

nums=[1,2,3,4,5,6]

print "中位数是",np.median(nums)
  • The average
  • Arithmetic mean

Wherein: the numerical values ​​divided by the number of data sets


import numpy as np

nums=[1,2,3,4,5.5]

print "算数平均值是",np.mean(nums)
  • weighted average

Wherein: each of the data set numerical weight of the product and their respective right

import numpy as np

nums=[2,3,4]
weights=[1,2,3]

print "加权平均值为",np.average(nums,weights=weights)
  • Geometric mean:

Wherein: the average value for the ratio among the various predetermined numerical

from scipy import stats as sta

nums=[2,3,1]

print "几何平均值为",sta.gmean(nums)
  • Quantile

Wherein: the probability distribution of a random variable is divided into several continuous sections of identical probability. Division point number less than 1 divided section. (Example quartiles)
`` `
Import numpy AS NP

nums=[2,3,4,5,6,7,8,1]

print "四分位点为",np.percentile(nums,25),np.percentile(nums,50),np.percentile(nums,75)
```

Data from the trend:

Numerical data

  • variance

Wherein: the data set and the sum of squared differences between the mean values ​​of arithmetic mean

  • Standard deviation

Feature: open secondary root of the variance

  • Very poor

Wherein: a difference between two values ​​of a data set most

  • The mean difference

Wherein: the arithmetic mean of the data set and the difference between the mean value

import numpy as np

nums=[2,3,4,5,6,7,8,9]

print "极差为",np.ptp(nums)
print "方差为",np.var(nums)
print "标准差为",np.std(nums)

Order data

  • Interquartile range
  • All the different ratio

The relative degree of dispersion:

  • Coefficient of variation

Distribution shape:

  • Skewness
  • Kurtosis coefficient

Guess you like

Origin www.cnblogs.com/S031602219/p/11220654.html