The use of python data analysis and data mining numpy and pandas modules (2)

One: numpy related operations

1 array creation: numpy.array(  [ ["element 1", "element 2"] , [ "element 1", "element 2"] , ["element 1", "element 2"] ]  )


  Generate arrays arange, zeros, ones


bool type:


(1) Vectorized operations: operations between array keys of the same size are applied to elements


(2) Vector and scalar operations: "broadcast" the scalar to each element


2 Sorting of arrays: sort method


3 Take the maximum and minimum values: y1=y.max() y2=y.min()


4 Slicing and indexing of one-dimensional arrays: cut according to the subscript, and take the element   array between a certain segment [starting subscript: final subscript +1]



5. Indexing and slicing of multidimensional arrays:

arr[ r1 : r2, c1 : c2] : r1, r2 represent slices of rows. c1,c2 represent slices of columns.

arr[1,1] is equivalent to arr[1][1]



Change the dimension of the array: first spread out and flatten, and then set the dimension.



6. Condition index: Boolean multidimensional array, arr[condition], condition can be multiple condition combinations, pay attention to use & | instead of and or for multiple condition combinations

(1) Single condition: First, a floating-point number between [0, 1) in 3 rows and 3 columns is randomly generated.


(2) Multiple conditions:


7: Transpose: transpose high-dimensional array transposition to specify the dimension number (0,1,2...)


8: General functions: element-level operations.

ceil up to the nearest integer

floor to the nearest integer down

rint rounds up

isnan Determines whether the element is NaN (not a number)

mutiply multiplies elements

divide elements to divide


9: The ternary expression for the vector version:

     numpy.where(condition,x,y)   :     x if condition else y


10: Commonly used statistical methods: Note that if there are multiple dimensions, you must specify the dimension of the statistics, otherwise the statistics will be performed on all dimensions by default. (axis=0 by column, axis=1 by row)

mean:

sum:

max:

min :

std:

where:

argmax:

argmin:

cumsum:

buy

all: all conditions are met

any: at least meet the conditions

unique: finds unique values ​​and returns sorted results


Two: pandas related operations

1 Series: Refers to a string of numbers, in order. row or column. Default index index (starting from 0)


2 DataFrame : The data is similar to a table

3 Dictionary way to create data 


4 Head data e.head(), the first 5 lines are taken by default

5 tail data e.tail(), the last 5 lines are taken by default


6 e.describe() counts by column.

The number of count in this column, mean mean, std standard deviation, the minimum value in the min column, and the maximum value in the max column.

The number corresponding to % represents the quantile of each column.


7 Transpose of data (row to column, column to row) eT




Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325563976&siteId=291194637