This content is completely for your own study (your own practice notes), and all the content comes from Chapter 4 of the book Data Analysis Using Python

Let's give an example to reflect the difference of numpy, assuming that numpy contains 1 million integers, and there is a Python list with the same data content:

import numpy as np

my_arr = np.arange(1000000)

my_list = list(range(1000000))

#计算的时间
%time for _ in range(10):my_arr2 = my_arr*2

Wall time: 21 ms
Compiler : 341 ms

#计算的时间
%time for _ in range(10):my_list2=[x*2 for x in my_list]

Wall time: 948 ms

The numpy method is 10 to 100 times faster than the python method and uses less memory.

4.1, Numpy ndarray: multidimensional array object

One of the core features of Numpy is the N-dimensional array object—ndarry.

#导入numpy
import numpy as np
#随机生成数组
data = np.random.randn(2,3)

data

array([[ 0.53526407,  1.42752699, -0.68798613],
       [-0.45544835, -1.35615318, -1.6924118 ]])

#数学操作
data*10

array([[  5.3526407 ,  14.27526989,  -6.87986133],
       [ -4.55448354, -13.56153181, -16.92411803]])

data+data

array([[ 1.07052814,  2.85505398, -1.37597227],
       [-0.91089671, -2.71230636, -3.38482361]])

#维度
data.shape

(2, 3)

#数据类型
data.dtype

dtype('float64')

4.1.1. Generate ndarry

list conversion

data1 = [6,7.5,8,0,1]

arr1 = np.array(data1)

arr1

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, such as lists of equal length, are automatically converted to multidimensional arrays

data2 = [[1,2,3,4],[5,6,7,8]]

arr2 = np.array(data2)

arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

arr2.ndim

arr2.shape

(2, 4)

arr1.dtype

dtype('float64')

arr2.dtype

dtype('int32')

After the length and shape are given, zeros can create all zero data at one time, and ones can create all 1 data at one time. Empty can create a data with no initialized value

np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

np.empty((2,3,2))

array([[[1.05075542e-311, 2.86558075e-322],
        [0.00000000e+000, 0.00000000e+000],
        [1.05699242e-307, 8.60952352e-072]],

       [[4.26976457e-090, 2.00497183e-052],
        [1.26141762e-076, 9.91606475e+164],
        [6.48224660e+170, 5.82471487e+257]]])

np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

It is not safe to use np.empty to generate an array of all zeros, and sometimes it may return uninitialized garbage values

arange is an array version of python's built-in function range

np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

4.1.2, ndarray data type

data type, ie dytpe

arr1 = np.array([1,2,3],dtype=np.float64)
arr2 = np.array([1,2,3],dtype=np.int32)

arr1.dtype

dtype('float64')

arr2.dtype

dtype('int32')

Use the astype method to explicitly convert the data type of the array

arr = np.array([1,2,3,4,5])

arr.dtype

dtype('int32')

convert integer to float

float_arr = arr.astype(np.float64)

float_arr.dtype

dtype('float64')

arr = np.array([3.7,2.5,4.3,5.0])

arr

array([3.7, 2.5, 4.3, 5. ])

The floating-point number is converted into an integer, and the part after the decimal point will be eliminated directly

arr.astype(np.int32)

array([3, 2, 4, 5])

Convert a string representing a number to a number

Be careful when using the numpy.string_type for strings, as Numpy will correct its size or remove input without warning. pandas has more intuitive out-of-the-box operations when dealing with non-numeric data

numeric_strings = np.array(['1.25','-3.4','4.0'],dtype=np.string_)

numeric_strings

array([b'1.25', b'-3.4', b'4.0'], dtype='|S4')

numeric_strings.astype(float)

array([ 1.25, -3.4 ,  4.  ])

Use the dtype attribute of another array

int_array = np.arange(10)

int_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

calibers = np.array([.22,.270,.345,.234],dtype=np.float64)

calibers

array([0.22 , 0.27 , 0.345, 0.234])

int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Use type codes to pass in data types

empty_unit32 = np.empty(8,dtype='u4')

empty_unit32

array([3264175145, 1070344437,  343597384, 1070679982, 3779571220,
       1070994554, 1168231105, 1070461878], dtype=uint32)

4.1.3 Numpy array arithmetic

arr = np.array([[1.,2.,3.],[4.,5.,6.]])

arr

array([[1., 2., 3.],
       [4., 5., 6.]])

arr + arr#加

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

arr - arr#减

array([[0., 0., 0.],
       [0., 0., 0.]])

arr * arr#乘

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

arr / arr#除

array([[1., 1., 1.],
       [1., 1., 1.]])

1 / arr#倒数

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

arr ** 0.5#开根号

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

Comparison between arrays of the same size will produce an array of boolean values

arr2 = np.array([[0.,4.,1.],[7.,4.,23.]])

arr2

array([[ 0.,  4.,  1.],
       [ 7.,  4., 23.]])

arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

4.1.4 Basic index and slice

arr = np.arange(10)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[5]

arr[5:8]

array([5, 6, 7])

arr[5:8] = 12

arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

array_slice = arr[5:8]

array_slice

array([12, 12, 12])

When changing the value in array_slice, the original array will also change, the slice of the array is the view of the original array

array_slice[1] = 123456

array_slice

array([    12, 123456,     12])

arr

array([     0,      1,      2,      3,      4,     12, 123456,     12,
            8,      9])

If you want a copy of the slice instead of a view, use arr[5:8].copy()

array_copy = arr[2:5].copy()

array_copy

array([2, 3, 4])

array_copy[1] = 12345

array_copy

array([    2, 12345,     4])

arr

array([     0,      1,      2,      3,      4,     12, 123456,     12,
            8,      9])

A [:] that does not write a slice value will refer to the ownership value of the array

array_slice[:] = 64

arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

Two-dimensional array

arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

arr2d[2]

array([7, 8, 9])

select a single element

arr2d[0][2]

arr2d[0,2]

three-dimensional array

arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])

arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

arr3d[0]#是一个2*3的数组

array([[1, 2, 3],
       [4, 5, 6]])

Both scalars and arrays can be passed to arr3d[0]

old_values = arr3d[0].copy()

old_values

array([[1, 2, 3],
       [4, 5, 6]])

arr3d[0] = 42

arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

arr3d[0] = old_values

arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Similar arr3d[1,0] returns a one-dimensional array:

arr3d[1,0]

array([7, 8, 9])

split into two steps

x = arr3d[1]

array([[ 7,  8,  9],
       [10, 11, 12]])

x[0]

array([7, 8, 9])

Note: The arrays returned in the subset selection above are views

4.1.4.1 Slice indexing of arrays

arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

arr[1:6]

array([ 1,  2,  3,  4, 64])

Two-dimensional array

arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

arr2d[:2]#行

array([[1, 2, 3],
       [4, 5, 6]])

Do multi-group slicing, similar to multi-group indexing

arr2d[:2,1:]

array([[2, 3],
       [5, 6]])

Select the first two columns of the second row

arr2d[1,:2]

array([4, 5])

Select the first two rows of the third column

arr2d[:2,2]

array([3, 6])

arr2d[:,:1]

array([[1],
       [4],
       [7]])

assignment

arr2d[:2,1:] = 0

arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

4.1.5 Boolean indexing

names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

data = np.random.randn(7,4)

names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

data

array([[-0.16858164, -0.33108982,  0.68263748, -0.0983769 ],
       [-0.14467573, -1.73207863, -0.20321916,  0.75697117],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [-0.20164359,  0.81705695, -0.51739626, -1.16344194],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538],
       [ 0.13222977, -1.45147521,  0.54796917,  1.19053359],
       [-1.02140787,  0.9426649 , -0.75485246,  0.20162042]])

names == 'Bob'

array([ True, False, False,  True, False, False, False])

data[names == 'Bob']

array([[-0.16858164, -0.33108982,  0.68263748, -0.0983769 ],
       [-0.20164359,  0.81705695, -0.51739626, -1.16344194]])

Note: When the length of the Boolean value array is incorrect, the method of Boolean value selection data will not report an error, so be careful when using it

data[names == 'Bob',2:]

array([[ 0.68263748, -0.0983769 ],
       [-0.51739626, -1.16344194]])

data[names == 'Bob',3]

array([-0.0983769 , -1.16344194])

can use! = or ~ negates the condition

names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

data[~(names == 'Bob')]

array([[-0.14467573, -1.73207863, -0.20321916,  0.75697117],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538],
       [ 0.13222977, -1.45147521,  0.54796917,  1.19053359],
       [-1.02140787,  0.9426649 , -0.75485246,  0.20162042]])

cond = names == 'Bob'

data[~cond]

array([[-0.14467573, -1.73207863, -0.20321916,  0.75697117],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538],
       [ 0.13222977, -1.45147521,  0.54796917,  1.19053359],
       [-1.02140787,  0.9426649 , -0.75485246,  0.20162042]])

mask = (names == 'Bob') | (names == 'Will')

mask

array([ True, False,  True,  True,  True, False, False])

data[mask]

array([[-0.16858164, -0.33108982,  0.68263748, -0.0983769 ],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [-0.20164359,  0.81705695, -0.51739626, -1.16344194],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538]])

Note: The python keywords and and or are not useful for boolean arrays, use & and | instead

data[data < 0]=0

data

array([[0.        , 0.        , 0.68263748, 0.        ],
       [0.        , 0.        , 0.        , 0.75697117],
       [1.38042424, 0.        , 2.10397966, 1.98598204],
       [0.        , 0.81705695, 0.        , 0.        ],
       [0.07882572, 0.        , 0.59073925, 1.49971538],
       [0.13222977, 0.        , 0.54796917, 1.19053359],
       [0.        , 0.9426649 , 0.        , 0.20162042]])

names != 'Joe'

array([ True, False,  True,  True,  True, False, False])

data[names != 'Joe']=7

data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 0.        , 0.75697117],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.13222977, 0.        , 0.54796917, 1.19053359],
       [0.        , 0.9426649 , 0.        , 0.20162042]])

4.1.6 Magic index

arr = np.empty((8,4))

for i in range(8):
    arr[i]=i

arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

Select a subset in a specific order

arr[[4,3,0,6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

If a negative index is used, selection will be done from the tail

arr[[-3,-5,-7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

arr = np.arange(32).reshape((8,4))

arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

arr[[1,5,7,2],[0,3,1,2]]

array([ 4, 23, 29, 10])

arr[[1,5,7,2]][:,[0,3,1,2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Magic indexing is not the same as slicing, it always copies the data into a new array

4.1.7 Array transpose and conversion

Transpose is a special way of reorganizing data that returns a view of the underlying data without duplicating anything. Arrays have a transpose method and also have a special T property.

arr = np.arange(15).reshape((3,5))

arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

Computing the matrix inner product will use np.dot

arr = np.random.randn(6,3)

arr

array([[-0.23144783, -1.53102926, -0.2230637 ],
       [ 1.65451328, -0.74725816, -0.64295544],
       [ 1.78178001,  0.19446786, -1.34621907],
       [ 0.12343761,  1.37570397, -0.92405543],
       [ 1.12624911, -1.76795706, -1.18655746],
       [ 0.92947622,  2.64016736, -1.06539457]])

np.dot(arr.T,arr)

array([[ 8.11332223,  0.09713011, -5.85149832],
       [ 0.09713011, 14.92898035, -1.42608964],
       [-5.85149832, -1.42608964,  5.67231753]])

For higher dimensional arrays, the transpose method can accept a tuple containing the axis number, which is used to permute the axis

arr = np.arange(16).reshape((2,2,4))

arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

arr.transpose(1,0,2)

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

Here, the axes have been reordered so that what was originally the second axis becomes the first, the first becomes the second, and the last axis has not changed

ndarray has a swapaxes method that takes a pair of axes numbers as arguments and adjusts the axes for reorganizing the data

arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

arr.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

swapaxes return a view of the data without copying the data

4.2 Universal Functions: Fast Element-wise Array Functions

arr = np.arange(10)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#平方根
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

#平方
np.square(arr)

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)

#自然指数值
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

binary universal function

x = np.random.randn(8)

y = np.random.randn(8)

array([ 0.43774471,  0.30353109, -0.4385476 , -0.07085461, -0.41682892,
        1.74171657,  0.22694261,  0.48012626])

array([ 0.38091604,  0.7351168 ,  0.04363922,  0.39276555, -0.11270609,
       -0.68831551, -0.64187507,  0.2514712 ])

#逐个元素将x,y中的最大值计算出来
np.maximum(x,y)

array([ 0.43774471,  0.7351168 ,  0.04363922,  0.39276555, -0.11270609,
        1.74171657,  0.22694261,  0.48012626])

There are also some generic functions that return multiple arrays. For example, modf is a vectorized version of python's built-in function divmod. It returns the fractional and integer parts of an array of float values

arr = np.random.randn(7)*5

arr

array([  0.69713224,  -0.39436563,  -1.4239261 ,  10.89444784,
         8.31602522,  -0.52237816, -10.31292285])

remainder, whole_part = np.modf(arr)

remainder

array([ 0.69713224, -0.39436563, -0.4239261 ,  0.89444784,  0.31602522,
       -0.52237816, -0.31292285])

whole_part

array([  0.,  -0.,  -1.,  10.,   8.,  -0., -10.])

arr

array([  0.69713224,  -0.39436563,  -1.4239261 ,  10.89444784,
         8.31602522,  -0.52237816, -10.31292285])

np.sqrt(arr)

<ipython-input-85-b58949107b3d>:1: RuntimeWarning: invalid value encountered in sqrt
  np.sqrt(arr)





array([0.83494446,        nan,        nan, 3.30067385, 2.88375193,
              nan,        nan])

np.sqrt(arr,arr)

<ipython-input-86-e3ca18b15869>:1: RuntimeWarning: invalid value encountered in sqrt
  np.sqrt(arr,arr)





array([0.83494446,        nan,        nan, 3.30067385, 2.88375193,
              nan,        nan])

arr

array([0.83494446,        nan,        nan, 3.30067385, 2.88375193,
              nan,        nan])

4.3 Array-Oriented Programming Using Arrays

^{We want to compute the value of the function sqrt(x 2 + y} 2 ) on some grid data . The np.meshgrid function takes two one-dimensional arrays and generates a two-dimensional matrix from all (x,y) pairs of the two arrays.

#随机生成数据
points = np.arange(-5,5,0.01)

#生成二维矩阵
xs, ys = np.meshgrid(points,points)

ys

array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

xs

array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ...,
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

#根据公式计算z
z = np.sqrt(xs ** 2 + ys ** 2)

array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]])

Generate visualizations of two-dimensional arrays using matplotlib

import matplotlib.pyplot as plt

plt.imshow(z,cmap=plt.cm.gray)
plt.colorbar()
#设置标题
plt.title('sqrt(x^2+y^2)')

Text(0.5, 1.0, 'sqrt(x^2+y^2)')

insert image description here

4.3.1 Manipulating conditional logic as an array

The np.where function is a vectorized version of the ternary expression x if condition else y

xarr = np.array([1.1,1.2,1.3,1.4,1.5])

yarr = np.array([2.1,2.2,2.3,2.4,2.5])

cond = np.array([True,False,True,True,False])

result = [(x if c else y)for x,y,c in zip(xarr,yarr,cond)]

result

[1.1, 2.2, 1.3, 1.4, 2.5]

If the array is too large, the speed will be very slow. It won't work if the array is multidimensional. And when using np.where, it can be done very simply

result = np.where(cond,xarr,yarr)#第二个第三个参数并不需要是数组，也可以是标量

result

array([1.1, 2.2, 1.3, 1.4, 2.5])

arr = np.random.randn(4,4)

arr

array([[ 1.45673658,  0.97095783, -0.90075114, -0.86810283],
       [ 0.7691019 , -1.44098307,  1.23655136, -0.0863179 ],
       [-0.26002458, -0.44007831, -0.64002542,  0.58748434],
       [ 1.23704204, -1.42979856,  1.10834965,  0.50134018]])

arr>0

array([[ True,  True, False, False],
       [ True, False,  True, False],
       [False, False, False,  True],
       [ True, False,  True,  True]])

#将所有正值替换成2，负值替换成-2
np.where(arr>0,2,-2)

array([[ 2,  2, -2, -2],
       [ 2, -2,  2, -2],
       [-2, -2, -2,  2],
       [ 2, -2,  2,  2]])

#将所有正值换成2
np.where(arr>0,2,arr)

array([[ 2.        ,  2.        , -0.90075114, -0.86810283],
       [ 2.        , -1.44098307,  2.        , -0.0863179 ],
       [-0.26002458, -0.44007831, -0.64002542,  2.        ],
       [ 2.        , -1.42979856,  2.        ,  2.        ]])

4.3.2 Mathematical and statistical methods

#生成数据
arr = np.random.randn(5,4)

arr

array([[-0.24008142, -0.08617688,  0.42879457, -1.05699554],
       [-0.86102647, -0.01481326, -0.49326453, -0.51728933],
       [-1.04369519, -0.07668856,  0.12641113, -0.34170659],
       [-0.34358427, -1.19146826,  0.79855649, -0.56526347],
       [ 0.34119469,  0.60338427,  0.23612535,  1.70667616]])

#平均值
arr.mean()

-0.1295455547355409

np.mean(arr)

-0.1295455547355409

#和
arr.sum()

-2.5909110947108185

#计算每一列的平均值
arr.mean(axis=1)

array([-0.23861482, -0.47159839, -0.3339198 , -0.32543988,  0.72184512])

#计算行轴向的和
arr.sum(axis=0)

array([-2.14719266, -0.76576269,  1.09662301, -0.77457876])

arr = np.array([0,1,2,3,4,5,6,7])

#从零开始元素累积和
arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

arr = np.array([[0,1,2],[3,4,5],[6,7,8]])

arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

arr.cumsum(axis=0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

#从1开始元素累积积
arr.cumprod(axis=1)

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

4.3.3 Methods for Arrays of Boolean Values

arr = np.random.randn(100)

#计算正值的个数
(arr>0).sum()

bools = np.array([False,False,True,False])

bools.any()#是否至少有一个True

True

bools.all()#是否全部为True

False

4.3.4 Sorting

arr = np.random.randn(6)

arr

array([-0.28600425,  0.20138334,  0.61513703, -1.54104191,  0.71169457,
        1.28541225])

arr.sort()#排序

arr

array([-1.54104191, -0.28600425,  0.20138334,  0.61513703,  0.71169457,
        1.28541225])

arr = np.random.randn(5,3)

arr

array([[ 0.44551524,  0.22691436, -1.49874737],
       [ 0.36256785,  1.19204608,  0.31673416],
       [ 0.07827487,  0.64557507, -1.31371171],
       [-1.01458161, -0.82770194, -0.06353473],
       [-0.40078359,  2.48821946, -0.50991488]])

arr.sort(1)

arr

array([[-1.49874737,  0.22691436,  0.44551524],
       [ 0.31673416,  0.36256785,  1.19204608],
       [-1.31371171,  0.07827487,  0.64557507],
       [-1.01458161, -0.82770194, -0.06353473],
       [-0.50991488, -0.40078359,  2.48821946]])

#计算一个数组的分位数，并选出分位数所对应的值
large_arr = np.random.randn(1000)

large_arr.sort()

large_arr[int(0.05*len(large_arr))]

-1.7200330679547906

4.3.5 Unique Values and Other Collection Logic

np.unique, returns the array formed by sorting the unique values in the array

names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

ints = np.array([3,3,3,2,2,1,1,4,4])

np.unique(ints)

array([1, 2, 3, 4])

np.unique compared to pure python

sorted(set(names))

['Bob', 'Joe', 'Will']

np.in1d, which can check whether the value in one array is in another array and return an array of boolean values

values = np.array([6,0,0,3,2,5,6])

np.in1d(values,[2,3,6])

array([ True, False, False,  True,  True, False,  True])

4.4 Using arrays for file input and output

np.save and np.load are two tool functions for efficiently accessing hard disk data. Arrays are stored in an uncompressed format by default, and the suffix is .npy.

arr = np.arange(10)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.save('some_array',arr)

np.load('some_array.npy')

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.savez passes an array as an argument to this function and is used to save multiple arrays in an uncompressed file.

np.savez('array_archive.npz',a=arr,b=arr)

arch = np.load('array_archive.npz')

arch['a']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arch['b']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If the data is already compressed, you can use np.savez_compressed.

np.savez_compressed('arrays_compressed..npz',a=arr,b=arr)

arch1 = np.load('arrays_compressed..npz')

arch1['a']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

4.5 Linear Algebra

x = np.array([[1.,2.,3.],[4.,5.,6.]])

y = np.array([[6.,23.],[-1,7],[8,9]])

array([[1., 2., 3.],
       [4., 5., 6.]])

array([[ 6., 23.],
       [-1.,  7.],
       [ 8.,  9.]])

x.dot(y)

array([[ 28.,  64.],
       [ 67., 181.]])

x.dot(y) is equivalent to np.dot(x,y)

np.dot(x,y)

array([[ 28.,  64.],
       [ 67., 181.]])

np.dot(x,np.ones(3))

array([ 6., 15.])

The special symbol @ is also used as an infix operator for dot multiplication matrix operations

x @ np.ones(3)

array([ 6., 15.])

numpy.linalg has a standard set of functions for matrix factorization, as well as other commonly used functions such as inversion and determinant solving

from numpy.linalg import inv, qr

X = np.random.randn(5,5)

mat = X.T.dot(X)

mat

array([[ 6.88097643, -0.40153042, -0.11773682,  4.82061317, -0.00948514],
       [-0.40153042,  2.93777143,  2.28436549, -3.33712964,  0.27895677],
       [-0.11773682,  2.28436549,  2.34334495, -1.8758072 ,  0.8700664 ],
       [ 4.82061317, -3.33712964, -1.8758072 ,  8.08801733, -1.40096259],
       [-0.00948514,  0.27895677,  0.8700664 , -1.40096259,  5.84629622]])

#求逆
inv(mat)

array([[  1.6344894 ,  -5.30599418,   3.59242482,  -2.48157756,
         -0.87347595],
       [ -5.30599418,  20.96406817, -14.76492044,   8.96595671,
          3.3369904 ],
       [  3.59242482, -14.76492044,  11.00957592,  -6.09349382,
         -2.38834458],
       [ -2.48157756,   8.96595671,  -6.09349382,   4.14309849,
          1.46783835],
       [ -0.87347595,   3.3369904 ,  -2.38834458,   1.46783835,
          0.71759003]])

mat.dot(inv(mat))

array([[ 1.00000000e+00, -3.62276349e-15,  2.97708758e-15,
         1.00365641e-15, -1.28563628e-15],
       [ 1.16986541e-15,  1.00000000e+00, -1.04489259e-15,
        -1.43234162e-15,  1.85031636e-16],
       [-9.17892019e-16,  5.90211635e-15,  1.00000000e+00,
         1.54913164e-15,  7.28854924e-16],
       [ 1.67430797e-15, -6.24584441e-15,  2.02034005e-15,
         1.00000000e+00, -1.38074650e-15],
       [ 6.79495103e-16, -4.09884887e-15,  3.43563276e-15,
        -1.61484598e-15,  1.00000000e+00]])

#计算QR分解
q,r = qr(mat)

array([[-8.41197519,  2.41336148,  1.31408829, -8.76534057,  0.8426876 ],
       [ 0.        , -4.40454207, -3.50602341,  5.0520601 , -1.60816204],
       [ 0.        ,  0.        , -0.98999659, -0.25669779, -3.68304671],
       [ 0.        ,  0.        ,  0.        , -1.68870316,  4.4795456 ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.22210084]])

4.6 Pseudo-random number generation

Use normal to get a 4*4 normal distribution

samples = np.random.normal(size=(4,4))

samples

array([[-1.08982894, -0.38664288,  0.08795078, -0.58766288],
       [-0.55362143,  0.53318817, -1.24544404, -0.28009587],
       [-0.62227897, -0.96513278,  0.94540138, -0.1743617 ],
       [-1.02020369,  0.44070475,  0.16880846,  1.32297271]])

Using numpy.random is an order of magnitude faster than the pure-python way of generating large samples

from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0,1) for _ in range(N)]

966 ms ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Compiler time: 0.15 s

%timeit np.random.normal(size=N)

29.7 ms ± 517 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

np.random.seed(1234)#更改随机数种子

#为了避免全局状态，可以使用numpy.random.RandomState创建一个随机数生成器，使数据独立于其他的随机数状态
rng = np.random.RandomState(1234)

rng.randn(10)

array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

4.7 Random walks

#1000步的随机漫步
import random
position = 0
walk = [position]
steps = 1000
for i in range(steps):
    step = 1 if random.randint(0,1) else -1
    position += step
    walk.append(position)

plt.plot(walk[:100])

insert image description here

#1000次随机投掷硬币的结果，每次结果为1或-1
nsteps = 1000
draws = np.random.randint(0,2,size=nsteps)
steps = np.where(draws>0,1,-1)
walk = steps.cumsum()

walk.min()

-9

walk.max()

plt.plot(walk[:100])

[<matplotlib.lines.Line2D at 0x2225fe86e20>]

insert image description here

#np.abs(walk)>=10表示连续在一个方向走了十步，argmax()可以返回布尔值数组中最大值的第一个位置（True就是最大值）
(np.abs(walk)>=10).argmax()

4.7.1 Simulating multiple random walks at once

#一次性跨行算出全部5000个随机步的累计和
nwalks = 5000
nsteps = 1000
draws = np.random.randint(0,2,size=(nwalks,nsteps))#0/1
steps = np.where(draws>0,1,-1)
walks = steps.cumsum(1)

walks

array([[  1,   2,   3, ...,  46,  47,  46],
       [  1,   0,   1, ...,  40,  41,  42],
       [  1,   2,   3, ..., -26, -27, -28],
       ...,
       [  1,   0,   1, ...,  64,  65,  66],
       [  1,   2,   1, ...,   2,   1,   0],
       [ -1,  -2,  -3, ...,  32,  33,  34]], dtype=int32)

plt.plot(walk[:100])

[<matplotlib.lines.Line2D at 0x2225fe82fd0>]

insert image description here

walks.max()

walks.min()

-128

#计算30的最小穿越时间
#使用any方法检查
hits30 = (np.abs(walks)>30).any(1)

hits30

array([ True,  True,  True, ...,  True, False,  True])

hits30.sum()#达到30的数字

#选出绝对值步数超过30的步所在的行，并使用argmax从轴向1上获取穿越时间
crossing_times = (np.abs(walks[hits30])>=30).argmax(1)

crossing_times.mean()

501.89283489096573

Follow the book Data Analysis with Python to learn data analysis, numpy basics

4.1, Numpy ndarray: multidimensional array object

4.1.1. Generate ndarry

4.1.2, ndarray data type

4.1.3 Numpy array arithmetic

4.1.4 Basic index and slice

4.1.4.1 Slice indexing of arrays

4.1.5 Boolean indexing

4.1.6 Magic index

4.1.7 Array transpose and conversion

4.2 Universal Functions: Fast Element-wise Array Functions

4.3 Array-Oriented Programming Using Arrays

4.3.1 Manipulating conditional logic as an array

4.3.2 Mathematical and statistical methods

4.3.3 Methods for Arrays of Boolean Values

4.3.4 Sorting

4.3.5 Unique Values and Other Collection Logic

4.4 Using arrays for file input and output

4.5 Linear Algebra

4.6 Pseudo-random number generation

4.7 Random walks

4.7.1 Simulating multiple random walks at once

Guess you like

Follow the book Data Analysis with Python to learn data analysis, numpy basics

4.1, Numpy ndarray: multidimensional array object

4.1.1. Generate ndarry

4.1.2, ndarray data type

4.1.3 Numpy array arithmetic

4.1.4 Basic index and slice

4.1.4.1 Slice indexing of arrays

4.1.5 Boolean indexing

4.1.6 Magic index

4.1.7 Array transpose and conversion

4.2 Universal Functions: Fast Element-wise Array Functions

4.3 Array-Oriented Programming Using Arrays

4.3.1 Manipulating conditional logic as an array

4.3.2 Mathematical and statistical methods

4.3.3 Methods for Arrays of Boolean Values

4.3.4 Sorting

4.3.5 Unique Values ​​and Other Collection Logic

4.4 Using arrays for file input and output

4.5 Linear Algebra

4.6 Pseudo-random number generation

4.7 Random walks

4.7.1 Simulating multiple random walks at once

Guess you like

4.3.5 Unique Values and Other Collection Logic