Python scientific computing: fast data processing with numpy
Very important third-party libraries NumPy
Python is the basis for data analysis
Standard Python, with values in list store arrays, due to the elements in the list can be any object, the list is stored in the object pointer
Python list is a list of the array, save a simple array [0,1,2], you need to have three pointers and three integer object, python is not economic but also a waste of memory and computation time
In addition to using numpy, need some skills to enhance memory and improve the utilization of computing resources, a rule is: Avoid implicit copy, and second-place operation, for example: Let a value of x is doubled, direct written X *
= 2, X = Do not write Y *
2
Array of structures
Statistics in a class corresponding student's name, age, as well as an array of English language scores how to do
C can be defined in the language structure array, by definition structure struct types, operation in the numpy:
import numpy as np
persontype = np.dtype({
'names':['name','age','chinese','math','english'],
'formats':['s32','i','i','i','f']})
peoples = np.array([("ZhangFei",32,75,100,90),("GuanYu",24,85,43,87.5),("ZhaoYun",28,85,92,96.5),("HuangZhong",29,85,93,34)],dtype=persontype)
ages =peoples[:]['age']
chineses = peoples[:]['chineses']
maths = peoples[:]['math']
englishs = peoples[:]['english']
print np.mean(ages)
print np.mean(chineses)
print np.mean(maths)
print np.mean(englishs)
numpt use dtype defined structure type, and then define an array of time with the array specifies the type dtype = persontype array of structures, you can freely use custom persontype, would like to know everyone's language scores can be used chineses = peoples[:]['chineses']
to calculate the average use np.mean
ufunc operation
Function operation can be performed for each element in the array
Create a continuous array
x1 = np.arange(1,11,2)
x2 = np.linspace(1,9,5)
arange and linspace are created equal difference array, x1 x2 is [1,3,5,7,9]
aRange () similar to the built-in function Range (), by specifying the initial value, the final value, the step size to create a one-dimensional array of arithmetic progression, the default value that does not include the final
linspace meaning aliquot represents a linear vector
Arithmetic
Add, subtract, multiply, divide, and power n seek access I
x1 = np.arange(1,11,2)
x2 = np.linspace(1,9,5)
print np.add(x1,x2)
print np.subtract(x1,x2)
print np.multiply(x1,x2)
print np.divide(x1,x2)
print np.power(x1,x2)
print np.remainder(x1,x2)
In the n-th power, x2 elements in the array is the number of actually, the array of elements x1 power as the base
Modulo function, you could use np.remainder (x1, x2), and results were np.mod (x1, x2) as
Statistical Functions
The need for data descriptive statistical analysis, such as for the maximum possible data, minimum, average, normal compliance
Group count / maximum function amax (), a function of the minimum value Amin () matrix
Amin () elements along a specified minimum value for the calculation of the axis of the array, amin (a) is the minimum value of all elements in the array, amin (a, 0) a minimum value along the axis = 0 axis, axis = 0 axis the elements look into [1,4,7], [2,5,8], [3,6,9] three elements, so a minimum of [1,2,3] is the column element. amin (a, 1) along the shaft axis = minimum value of 1, the element axis = 1 as put [1,2,3], [4,5,6], [7,8,9] three elements, Therefore, the minimum value of [1,4,7]
Statistical difference between the maximum value and the minimum value of PTP ()
np.ptp (a) an array statistical difference between maximum and minimum values, i.e., 9-1 = 8, ptp (a, 0) is the statistical axis = 0 axis of the difference between the maximum and minimum values, i.e. 7-1 = 6 (8-2,9-3) ptp (a, 1) along the axis statistical difference between the maximum and the minimum value = 1 3-1 = 2, i.e. the shaft (6-4,9-7)
Statistics array percentile percentile ()
percentile () represents the p-th percentile range of p is 0-100, if p = 0, minimization, p = 50, averaging, p = 100, selecting the maximum value, and can be determined in the axis = 0, axis = p% of the two shafts 1 percentile
The median median (), the average mean () an array of statistics
Median Median seeking arrays with () and mean (), the average value
Weighted average statistical array average ()
a = np.array([1,2,3,4])
wts = np.array([1,2,3,4])
print np.average(a)
print np.average(a,weight=wts)
average () function evaluation weighted averaging, weighted averaging means may be provided for each element of a weight, the weight of each element is the default weights are the same, so np.average (a) = (1 + 2 + 3 + 4) / 4 = 2.5, the weight can be specified array wts = [1,2,3,4], the weighted average np.average (a, weight = wts) =(1*1+2*2+3*3+4*4)/(1+2+3+4)=3.0
Statistics array of standard deviation std (), variance var ()
Calculating the variance is an average of the summed squared difference between each value to the mean i.e. mean((x-x.mean())**2)
, the difference is denoted by square root of the variance, it represents a degree of dispersion from the mean of the array of arrays
numpy sorting
Using sort function, sort (a, axis = -1, kind = 'quicksort', order = None), default quicksort, kind can specify quicksort, mergesort, heapsort represents quick sort, merge sort, heap sort, axis = - 1, that is along the axis of a final sorting array, may take on different axes axis, or axis = none Representative uses flat manner as a sort vector, order field may specify a field in accordance with the structure of the array for Sort
a = np.array([[4,3,2],[2,4,1]])
print np.sort(a)
print np.sort(a,axis=None)
print np.sort(a,axis=0)
print np.sort(a,axis=1)
[[2 3 4],[1 2 4]]
[1 2 2 3 4 4]
[[2 3 1],[4 4 2]]
[[2 3 4],[1 2 4]]
Exercise: Statistics class results
Suppose a team, there are five participants, the results shown below, you can use these statistics numpy Average scores in Chinese, English, mathematics, the minimum score, variance, standard deviation, then the total score of these people to sort, famously times were the results output
Full name | Chinese | English | mathematics |
---|---|---|---|
Zhang Fei | 66 | 65 | 30 |
Guan Yu | 95 | 85 | 98 |
Zhao | 93 | 92 | 96 |
Huang | 90 | 88 | 77 |
Wade | 90 | 90 | 90 |
# _*_ coding:utf-8 _*_
import numpy as np
a = np.array([[4,3,2],[2,4,1]])
print(np.sort(a))
print(np.sort(a,axis=None))
print(np.sort(a,axis=0))
print(np.sort(a,axis=1))
persontype = np.dtype({
'names':['name','age','chinese','math','english'],
'formats':['s32','i','i','i','f']})
peoples = np.array([("ZhangFei",32,75,100,90),("GuanYu",24,85,43,87.5),("ZhaoYun",28,85,92,96.5),("HuangZhong",29,85,93,34),("DianWei",80,90,90)],dtype=persontype)
name = peoples[:]["name"]
chineses = peoples[:]["chineses"]
english = peoples[:]['english']
math = peoles[:]['math']
# 定义函数用于显示每一排的内容
def show(name , cj):
print('{}|{}|{}|{}|{}|{}'.format(name,np.mean(cj),np.min(cj),np.max(cj),np.var(cj),np.std(cj)))
print("科目|平均成绩|最小成绩|最大成绩|方差|标准差")
show("语文",chineses)
show("英语",english)
show("数学",math)
print("排名:")
# 用sorted函数进行排序
ranking = sorted(peoples,key=lambda x:x[1]+x[2]+x[3],reverse=True)
print(ranking)
By cmp () and lambda (), sorted according to the sum of three subjects, and set reverse = True descending order