Descriptive statistics data
The following data are set as a processing object nums, wherein a plurality of the same type of data comprising
Central tendency of the data:
- 众数
Wherein: most often occur in the data set (not unique)
(Note: Because the average scipy.stats.mode only returns a number which first appeared in public, so I was expanded the following rewrite more in line with the definition of the mode of)
#!/usr/bin/python
# coding:utf-8
from scipy import stats
from collections import Counter
nums=[3,0,1,2,0,1,2,0,1,2]
datas=dict(Counter(nums)) # 获得按频率分配后的统计字典
keys=[] # 统计满足是众数的keys集合
sum=0 # 统计符合value与众数的value相同的个数
count = stats.mode(nums)[1]
for key,value in datas.items():
if value == count:
keys.append(key)
sum+=1
if sum==len(datas): # 若整个数据集各个元素出现的频率相同,则不存在众数;反之返回符合的众数集合(可能不止一个)
print "该数据集中无众数"
else:
print "该数据集的众数为:",keys
- Median
Wherein: the number of values included in the data set, if it is odd, for the minimum number of rows in the middle; if it is an even number, for the two rows in the middle of the arithmetic mean of the number of
import numpy as np
nums=[1,2,3,4,5,6]
print "中位数是",np.median(nums)
- The average
- Arithmetic mean
Wherein: the numerical values divided by the number of data sets
import numpy as np
nums=[1,2,3,4,5.5]
print "算数平均值是",np.mean(nums)
- weighted average
Wherein: each of the data set numerical weight of the product and their respective right
import numpy as np
nums=[2,3,4]
weights=[1,2,3]
print "加权平均值为",np.average(nums,weights=weights)
- Geometric mean:
Wherein: the average value for the ratio among the various predetermined numerical
from scipy import stats as sta
nums=[2,3,1]
print "几何平均值为",sta.gmean(nums)
- Quantile
Wherein: the probability distribution of a random variable is divided into several continuous sections of identical probability. Division point number less than 1 divided section. (Example quartiles)
`` `
Import numpy AS NP
nums=[2,3,4,5,6,7,8,1]
print "四分位点为",np.percentile(nums,25),np.percentile(nums,50),np.percentile(nums,75)
```
Data from the trend:
Numerical data
- variance
Wherein: the data set and the sum of squared differences between the mean values of arithmetic mean
- Standard deviation
Feature: open secondary root of the variance
- Very poor
Wherein: a difference between two values of a data set most
- The mean difference
Wherein: the arithmetic mean of the data set and the difference between the mean value
import numpy as np
nums=[2,3,4,5,6,7,8,9]
print "极差为",np.ptp(nums)
print "方差为",np.var(nums)
print "标准差为",np.std(nums)
Order data
- Interquartile range
- All the different ratio
The relative degree of dispersion:
- Coefficient of variation
Distribution shape:
- Skewness
- Kurtosis coefficient