Python scientific computing: fast data processing with numpy

Python scientific computing: fast data processing with numpy

Very important third-party libraries NumPy

Python is the basis for data analysis

Standard Python, with values ​​in list store arrays, due to the elements in the list can be any object, the list is stored in the object pointer

Python list is a list of the array, save a simple array [0,1,2], you need to have three pointers and three integer object, python is not economic but also a waste of memory and computation time

In addition to using numpy, need some skills to enhance memory and improve the utilization of computing resources, a rule is: Avoid implicit copy, and second-place operation, for example: Let a value of x is doubled, direct written X *= 2, X = Do not write Y *2

Array of structures

Statistics in a class corresponding student's name, age, as well as an array of English language scores how to do

C can be defined in the language structure array, by definition structure struct types, operation in the numpy:

import numpy as np
persontype = np.dtype({
	'names':['name','age','chinese','math','english'],
	'formats':['s32','i','i','i','f']})
peoples = np.array([("ZhangFei",32,75,100,90),("GuanYu",24,85,43,87.5),("ZhaoYun",28,85,92,96.5),("HuangZhong",29,85,93,34)],dtype=persontype)
ages =peoples[:]['age']
chineses = peoples[:]['chineses']
maths = peoples[:]['math']
englishs = peoples[:]['english']
print np.mean(ages)
print np.mean(chineses)
print np.mean(maths)
print np.mean(englishs)

numpt use dtype defined structure type, and then define an array of time with the array specifies the type dtype = persontype array of structures, you can freely use custom persontype, would like to know everyone's language scores can be used chineses = peoples[:]['chineses']to calculate the average use np.mean

ufunc operation

Function operation can be performed for each element in the array

Create a continuous array
x1 = np.arange(1,11,2)
x2 = np.linspace(1,9,5)

arange and linspace are created equal difference array, x1 x2 is [1,3,5,7,9]

aRange () similar to the built-in function Range (), by specifying the initial value, the final value, the step size to create a one-dimensional array of arithmetic progression, the default value that does not include the final

linspace meaning aliquot represents a linear vector

Arithmetic

Add, subtract, multiply, divide, and power n seek access I

x1 = np.arange(1,11,2)
x2 = np.linspace(1,9,5)
print np.add(x1,x2)
print np.subtract(x1,x2)
print np.multiply(x1,x2)
print np.divide(x1,x2)
print np.power(x1,x2)
print np.remainder(x1,x2)

In the n-th power, x2 elements in the array is the number of actually, the array of elements x1 power as the base

Modulo function, you could use np.remainder (x1, x2), and results were np.mod (x1, x2) as

Statistical Functions

The need for data descriptive statistical analysis, such as for the maximum possible data, minimum, average, normal compliance

Group count / maximum function amax (), a function of the minimum value Amin () matrix

Amin () elements along a specified minimum value for the calculation of the axis of the array, amin (a) is the minimum value of all elements in the array, amin (a, 0) a minimum value along the axis = 0 axis, axis = 0 axis the elements look into [1,4,7], [2,5,8], [3,6,9] three elements, so a minimum of [1,2,3] is the column element. amin (a, 1) along the shaft axis = minimum value of 1, the element axis = 1 as put [1,2,3], [4,5,6], [7,8,9] three elements, Therefore, the minimum value of [1,4,7]

Statistical difference between the maximum value and the minimum value of PTP ()

np.ptp (a) an array statistical difference between maximum and minimum values, i.e., 9-1 = 8, ptp (a, 0) is the statistical axis = 0 axis of the difference between the maximum and minimum values, i.e. 7-1 = 6 (8-2,9-3) ptp (a, 1) along the axis statistical difference between the maximum and the minimum value = 1 3-1 = 2, i.e. the shaft (6-4,9-7)

Statistics array percentile percentile ()

percentile () represents the p-th percentile range of p is 0-100, if p = 0, minimization, p = 50, averaging, p = 100, selecting the maximum value, and can be determined in the axis = 0, axis = p% of the two shafts 1 percentile

The median median (), the average mean () an array of statistics

Median Median seeking arrays with () and mean (), the average value

Weighted average statistical array average ()
a = np.array([1,2,3,4])
wts = np.array([1,2,3,4])
print np.average(a)
print np.average(a,weight=wts)

average () function evaluation weighted averaging, weighted averaging means may be provided for each element of a weight, the weight of each element is the default weights are the same, so np.average (a) = (1 + 2 + 3 + 4) / 4 = 2.5, the weight can be specified array wts = [1,2,3,4], the weighted average np.average (a, weight = wts) =(1*1+2*2+3*3+4*4)/(1+2+3+4)=3.0

Statistics array of standard deviation std (), variance var ()

Calculating the variance is an average of the summed squared difference between each value to the mean i.e. mean((x-x.mean())**2), the difference is denoted by square root of the variance, it represents a degree of dispersion from the mean of the array of arrays

numpy sorting

Using sort function, sort (a, axis = -1, kind = 'quicksort', order = None), default quicksort, kind can specify quicksort, mergesort, heapsort represents quick sort, merge sort, heap sort, axis = - 1, that is along the axis of a final sorting array, may take on different axes axis, or axis = none Representative uses flat manner as a sort vector, order field may specify a field in accordance with the structure of the array for Sort

a = np.array([[4,3,2],[2,4,1]])
print np.sort(a)
print np.sort(a,axis=None)
print np.sort(a,axis=0)
print np.sort(a,axis=1)

[[2 3 4],[1 2 4]]

[1 2 2 3 4 4]

[[2 3 1],[4 4 2]]

[[2 3 4],[1 2 4]]

Exercise: Statistics class results

Suppose a team, there are five participants, the results shown below, you can use these statistics numpy Average scores in Chinese, English, mathematics, the minimum score, variance, standard deviation, then the total score of these people to sort, famously times were the results output

Full name Chinese English mathematics
Zhang Fei 66 65 30
Guan Yu 95 85 98
Zhao 93 92 96
Huang 90 88 77
Wade 90 90 90
# _*_ coding:utf-8 _*_
import numpy as np
a = np.array([[4,3,2],[2,4,1]])
print(np.sort(a))
print(np.sort(a,axis=None))
print(np.sort(a,axis=0))
print(np.sort(a,axis=1))

persontype = np.dtype({
	'names':['name','age','chinese','math','english'],
	'formats':['s32','i','i','i','f']})
peoples = np.array([("ZhangFei",32,75,100,90),("GuanYu",24,85,43,87.5),("ZhaoYun",28,85,92,96.5),("HuangZhong",29,85,93,34),("DianWei",80,90,90)],dtype=persontype)

name = peoples[:]["name"]
chineses = peoples[:]["chineses"]
english = peoples[:]['english']
math = peoles[:]['math']
# 定义函数用于显示每一排的内容
def show(name , cj):
	print('{}|{}|{}|{}|{}|{}'.format(name,np.mean(cj),np.min(cj),np.max(cj),np.var(cj),np.std(cj)))
	
print("科目|平均成绩|最小成绩|最大成绩|方差|标准差")
show("语文",chineses)
show("英语",english)
show("数学",math)

print("排名:")
# 用sorted函数进行排序
ranking = sorted(peoples,key=lambda x:x[1]+x[2]+x[3],reverse=True)
print(ranking)

By cmp () and lambda (), sorted according to the sum of three subjects, and set reverse = True descending order

Published 75 original articles · won praise 9 · views 9176

Guess you like

Origin blog.csdn.net/ywangjiyl/article/details/104719775