Follow the book Data Analysis with Python to learn data analysis, numpy basics

This content is completely for your own study (your own practice notes), and all the content comes from Chapter 4 of the book Data Analysis Using Python

Let's give an example to reflect the difference of numpy, assuming that numpy contains 1 million integers, and there is a Python list with the same data content:

import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))
#计算的时间
%time for _ in range(10):my_arr2 = my_arr*2
Wall time: 21 ms
Compiler : 341 ms
#计算的时间
%time for _ in range(10):my_list2=[x*2 for x in my_list]
Wall time: 948 ms

The numpy method is 10 to 100 times faster than the python method and uses less memory.

4.1, Numpy ndarray: multidimensional array object

One of the core features of Numpy is the N-dimensional array object—ndarry.

#导入numpy
import numpy as np
#随机生成数组
data = np.random.randn(2,3)
data
array([[ 0.53526407,  1.42752699, -0.68798613],
       [-0.45544835, -1.35615318, -1.6924118 ]])
#数学操作
data*10
array([[  5.3526407 ,  14.27526989,  -6.87986133],
       [ -4.55448354, -13.56153181, -16.92411803]])
data+data
array([[ 1.07052814,  2.85505398, -1.37597227],
       [-0.91089671, -2.71230636, -3.38482361]])
#维度
data.shape
(2, 3)
#数据类型
data.dtype
dtype('float64')

4.1.1. Generate ndarry

list conversion

data1 = [6,7.5,8,0,1]
arr1 = np.array(data1)
arr1
array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, such as lists of equal length, are automatically converted to multidimensional arrays

data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
arr2
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
arr2.ndim
2
arr2.shape
(2, 4)
arr1.dtype
dtype('float64')
arr2.dtype
dtype('int32')

After the length and shape are given, zeros can create all zero data at one time, and ones can create all 1 data at one time. Empty can create a data with no initialized value

np.zeros(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
np.zeros((3,6))
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])
np.empty((2,3,2))
array([[[1.05075542e-311, 2.86558075e-322],
        [0.00000000e+000, 0.00000000e+000],
        [1.05699242e-307, 8.60952352e-072]],

       [[4.26976457e-090, 2.00497183e-052],
        [1.26141762e-076, 9.91606475e+164],
        [6.48224660e+170, 5.82471487e+257]]])
np.ones((2,3))
array([[1., 1., 1.],
       [1., 1., 1.]])

It is not safe to use np.empty to generate an array of all zeros, and sometimes it may return uninitialized garbage values

arange is an array version of python's built-in function range

np.arange(15)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

4.1.2, ndarray data type

data type, ie dytpe

arr1 = np.array([1,2,3],dtype=np.float64)
arr2 = np.array([1,2,3],dtype=np.int32)
arr1.dtype
dtype('float64')
arr2.dtype
dtype('int32')

Use the astype method to explicitly convert the data type of the array

arr = np.array([1,2,3,4,5])
arr.dtype
dtype('int32')

convert integer to float

float_arr = arr.astype(np.float64)
float_arr.dtype
dtype('float64')
arr = np.array([3.7,2.5,4.3,5.0])
arr
array([3.7, 2.5, 4.3, 5. ])

The floating-point number is converted into an integer, and the part after the decimal point will be eliminated directly

arr.astype(np.int32)
array([3, 2, 4, 5])

Convert a string representing a number to a number

Be careful when using the numpy.string_type for strings, as Numpy will correct its size or remove input without warning. pandas has more intuitive out-of-the-box operations when dealing with non-numeric data

numeric_strings = np.array(['1.25','-3.4','4.0'],dtype=np.string_)
numeric_strings
array([b'1.25', b'-3.4', b'4.0'], dtype='|S4')
numeric_strings.astype(float)
array([ 1.25, -3.4 ,  4.  ])

Use the dtype attribute of another array

int_array = np.arange(10)
int_array
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
calibers = np.array([.22,.270,.345,.234],dtype=np.float64)
calibers
array([0.22 , 0.27 , 0.345, 0.234])
int_array.astype(calibers.dtype)
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Use type codes to pass in data types

empty_unit32 = np.empty(8,dtype='u4')
empty_unit32
array([3264175145, 1070344437,  343597384, 1070679982, 3779571220,
       1070994554, 1168231105, 1070461878], dtype=uint32)

4.1.3 Numpy array arithmetic

arr = np.array([[1.,2.,3.],[4.,5.,6.]])
arr
array([[1., 2., 3.],
       [4., 5., 6.]])
arr + arr#加
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])
arr - arr#减
array([[0., 0., 0.],
       [0., 0., 0.]])
arr * arr#乘
array([[ 1.,  4.,  9.],
       [16., 25., 36.]])
arr / arr#除
array([[1., 1., 1.],
       [1., 1., 1.]])
1 / arr#倒数
array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])
arr ** 0.5#开根号
array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

Comparison between arrays of the same size will produce an array of boolean values

arr2 = np.array([[0.,4.,1.],[7.,4.,23.]])
arr2
array([[ 0.,  4.,  1.],
       [ 7.,  4., 23.]])
arr2 > arr
array([[False,  True, False],
       [ True, False,  True]])

4.1.4 Basic index and slice

arr = np.arange(10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr[5]
5
arr[5:8]
array([5, 6, 7])
arr[5:8] = 12
arr
array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])
array_slice = arr[5:8]
array_slice
array([12, 12, 12])

When changing the value in array_slice, the original array will also change, the slice of the array is the view of the original array

array_slice[1] = 123456
array_slice
array([    12, 123456,     12])
arr
array([     0,      1,      2,      3,      4,     12, 123456,     12,
            8,      9])

If you want a copy of the slice instead of a view, use arr[5:8].copy()

array_copy = arr[2:5].copy()
array_copy
array([2, 3, 4])
array_copy[1] = 12345
array_copy
array([    2, 12345,     4])
arr
array([     0,      1,      2,      3,      4,     12, 123456,     12,
            8,      9])

A [:] that does not write a slice value will refer to the ownership value of the array

array_slice[:] = 64
arr
array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

Two-dimensional array

arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
arr2d[2]
array([7, 8, 9])

select a single element

arr2d[0][2]
3
arr2d[0,2]
3

three-dimensional array

arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
arr3d
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])
arr3d[0]#是一个2*3的数组
array([[1, 2, 3],
       [4, 5, 6]])

Both scalars and arrays can be passed to arr3d[0]

old_values = arr3d[0].copy()
old_values
array([[1, 2, 3],
       [4, 5, 6]])
arr3d[0] = 42
arr3d
array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])
arr3d[0] = old_values
arr3d
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Similar arr3d[1,0] returns a one-dimensional array:

arr3d[1,0]
array([7, 8, 9])

split into two steps

x = arr3d[1]
x
array([[ 7,  8,  9],
       [10, 11, 12]])
x[0]
array([7, 8, 9])

Note: The arrays returned in the subset selection above are views

4.1.4.1 Slice indexing of arrays

arr
array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])
arr[1:6]
array([ 1,  2,  3,  4, 64])

Two-dimensional array

arr2d
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
arr2d[:2]#行
array([[1, 2, 3],
       [4, 5, 6]])

Do multi-group slicing, similar to multi-group indexing

arr2d[:2,1:]
array([[2, 3],
       [5, 6]])

Select the first two columns of the second row

arr2d[1,:2]
array([4, 5])

Select the first two rows of the third column

arr2d[:2,2]
array([3, 6])
arr2d[:,:1]
array([[1],
       [4],
       [7]])

assignment

arr2d[:2,1:] = 0
arr2d
array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

4.1.5 Boolean indexing

names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
data = np.random.randn(7,4)
names
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')
data
array([[-0.16858164, -0.33108982,  0.68263748, -0.0983769 ],
       [-0.14467573, -1.73207863, -0.20321916,  0.75697117],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [-0.20164359,  0.81705695, -0.51739626, -1.16344194],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538],
       [ 0.13222977, -1.45147521,  0.54796917,  1.19053359],
       [-1.02140787,  0.9426649 , -0.75485246,  0.20162042]])
names == 'Bob'
array([ True, False, False,  True, False, False, False])
data[names == 'Bob']
array([[-0.16858164, -0.33108982,  0.68263748, -0.0983769 ],
       [-0.20164359,  0.81705695, -0.51739626, -1.16344194]])

Note: When the length of the Boolean value array is incorrect, the method of Boolean value selection data will not report an error, so be careful when using it

data[names == 'Bob',2:]
array([[ 0.68263748, -0.0983769 ],
       [-0.51739626, -1.16344194]])
data[names == 'Bob',3]
array([-0.0983769 , -1.16344194])

can use! = or ~ negates the condition

names != 'Bob'
array([False,  True,  True, False,  True,  True,  True])
data[~(names == 'Bob')]
array([[-0.14467573, -1.73207863, -0.20321916,  0.75697117],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538],
       [ 0.13222977, -1.45147521,  0.54796917,  1.19053359],
       [-1.02140787,  0.9426649 , -0.75485246,  0.20162042]])
cond = names == 'Bob'
data[~cond]
array([[-0.14467573, -1.73207863, -0.20321916,  0.75697117],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538],
       [ 0.13222977, -1.45147521,  0.54796917,  1.19053359],
       [-1.02140787,  0.9426649 , -0.75485246,  0.20162042]])
mask = (names == 'Bob') | (names == 'Will')
mask
array([ True, False,  True,  True,  True, False, False])
data[mask]
array([[-0.16858164, -0.33108982,  0.68263748, -0.0983769 ],
       [ 1.38042424, -1.31551497,  2.10397966,  1.98598204],
       [-0.20164359,  0.81705695, -0.51739626, -1.16344194],
       [ 0.07882572, -0.68212957,  0.59073925,  1.49971538]])

Note: The python keywords and and or are not useful for boolean arrays, use & and | instead

data[data < 0]=0
data
array([[0.        , 0.        , 0.68263748, 0.        ],
       [0.        , 0.        , 0.        , 0.75697117],
       [1.38042424, 0.        , 2.10397966, 1.98598204],
       [0.        , 0.81705695, 0.        , 0.        ],
       [0.07882572, 0.        , 0.59073925, 1.49971538],
       [0.13222977, 0.        , 0.54796917, 1.19053359],
       [0.        , 0.9426649 , 0.        , 0.20162042]])
names != 'Joe'
array([ True, False,  True,  True,  True, False, False])
data[names != 'Joe']=7
data
array([[7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 0.        , 0.75697117],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.13222977, 0.        , 0.54796917, 1.19053359],
       [0.        , 0.9426649 , 0.        , 0.20162042]])

4.1.6 Magic index

arr = np.empty((8,4))
for i in range(8):
    arr[i]=i
arr
array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

Select a subset in a specific order

arr[[4,3,0,6]]
array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

If a negative index is used, selection will be done from the tail

arr[[-3,-5,-7]]
array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])
arr = np.arange(32).reshape((8,4))
arr
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])
arr[[1,5,7,2],[0,3,1,2]]
array([ 4, 23, 29, 10])
arr[[1,5,7,2]][:,[0,3,1,2]]
array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Magic indexing is not the same as slicing, it always copies the data into a new array

4.1.7 Array transpose and conversion

Transpose is a special way of reorganizing data that returns a view of the underlying data without duplicating anything. Arrays have a transpose method and also have a special T property.

arr = np.arange(15).reshape((3,5))
arr
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
arr.T
array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

Computing the matrix inner product will use np.dot

arr = np.random.randn(6,3)
arr
array([[-0.23144783, -1.53102926, -0.2230637 ],
       [ 1.65451328, -0.74725816, -0.64295544],
       [ 1.78178001,  0.19446786, -1.34621907],
       [ 0.12343761,  1.37570397, -0.92405543],
       [ 1.12624911, -1.76795706, -1.18655746],
       [ 0.92947622,  2.64016736, -1.06539457]])
np.dot(arr.T,arr)
array([[ 8.11332223,  0.09713011, -5.85149832],
       [ 0.09713011, 14.92898035, -1.42608964],
       [-5.85149832, -1.42608964,  5.67231753]])

For higher dimensional arrays, the transpose method can accept a tuple containing the axis number, which is used to permute the axis

arr = np.arange(16).reshape((2,2,4))
arr
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])
arr.transpose(1,0,2)
array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

Here, the axes have been reordered so that what was originally the second axis becomes the first, the first becomes the second, and the last axis has not changed

ndarray has a swapaxes method that takes a pair of axes numbers as arguments and adjusts the axes for reorganizing the data

arr
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])
arr.swapaxes(1,2)
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

swapaxes return a view of the data without copying the data

4.2 Universal Functions: Fast Element-wise Array Functions

arr = np.arange(10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
#平方根
np.sqrt(arr)
array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])
#平方
np.square(arr)
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)
#自然指数值
np.exp(arr)
array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

binary universal function

x = np.random.randn(8)
y = np.random.randn(8)
x
array([ 0.43774471,  0.30353109, -0.4385476 , -0.07085461, -0.41682892,
        1.74171657,  0.22694261,  0.48012626])
y
array([ 0.38091604,  0.7351168 ,  0.04363922,  0.39276555, -0.11270609,
       -0.68831551, -0.64187507,  0.2514712 ])
#逐个元素将x,y中的最大值计算出来
np.maximum(x,y)
array([ 0.43774471,  0.7351168 ,  0.04363922,  0.39276555, -0.11270609,
        1.74171657,  0.22694261,  0.48012626])

There are also some generic functions that return multiple arrays. For example, modf is a vectorized version of python's built-in function divmod. It returns the fractional and integer parts of an array of float values

arr = np.random.randn(7)*5
arr
array([  0.69713224,  -0.39436563,  -1.4239261 ,  10.89444784,
         8.31602522,  -0.52237816, -10.31292285])
remainder, whole_part = np.modf(arr)
remainder
array([ 0.69713224, -0.39436563, -0.4239261 ,  0.89444784,  0.31602522,
       -0.52237816, -0.31292285])
whole_part
array([  0.,  -0.,  -1.,  10.,   8.,  -0., -10.])
arr
array([  0.69713224,  -0.39436563,  -1.4239261 ,  10.89444784,
         8.31602522,  -0.52237816, -10.31292285])
np.sqrt(arr)
<ipython-input-85-b58949107b3d>:1: RuntimeWarning: invalid value encountered in sqrt
  np.sqrt(arr)





array([0.83494446,        nan,        nan, 3.30067385, 2.88375193,
              nan,        nan])
np.sqrt(arr,arr)
<ipython-input-86-e3ca18b15869>:1: RuntimeWarning: invalid value encountered in sqrt
  np.sqrt(arr,arr)





array([0.83494446,        nan,        nan, 3.30067385, 2.88375193,
              nan,        nan])
arr
array([0.83494446,        nan,        nan, 3.30067385, 2.88375193,
              nan,        nan])

4.3 Array-Oriented Programming Using Arrays

We want to compute the value of the function sqrt(x 2 + y 2 ) on some grid data . The np.meshgrid function takes two one-dimensional arrays and generates a two-dimensional matrix from all (x,y) pairs of the two arrays.

#随机生成数据
points = np.arange(-5,5,0.01)
#生成二维矩阵
xs, ys = np.meshgrid(points,points)
ys
array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])
xs
array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ...,
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])
#根据公式计算z
z = np.sqrt(xs ** 2 + ys ** 2)
z
array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]])

Generate visualizations of two-dimensional arrays using matplotlib

import matplotlib.pyplot as plt
plt.imshow(z,cmap=plt.cm.gray)
plt.colorbar()
#设置标题
plt.title('sqrt(x^2+y^2)')
Text(0.5, 1.0, 'sqrt(x^2+y^2)')

insert image description here

4.3.1 Manipulating conditional logic as an array

The np.where function is a vectorized version of the ternary expression x if condition else y

xarr = np.array([1.1,1.2,1.3,1.4,1.5])
yarr = np.array([2.1,2.2,2.3,2.4,2.5])
cond = np.array([True,False,True,True,False])
result = [(x if c else y)for x,y,c in zip(xarr,yarr,cond)]
result
[1.1, 2.2, 1.3, 1.4, 2.5]

If the array is too large, the speed will be very slow. It won't work if the array is multidimensional. And when using np.where, it can be done very simply

result = np.where(cond,xarr,yarr)#第二个第三个参数并不需要是数组,也可以是标量
result
array([1.1, 2.2, 1.3, 1.4, 2.5])
arr = np.random.randn(4,4)
arr
array([[ 1.45673658,  0.97095783, -0.90075114, -0.86810283],
       [ 0.7691019 , -1.44098307,  1.23655136, -0.0863179 ],
       [-0.26002458, -0.44007831, -0.64002542,  0.58748434],
       [ 1.23704204, -1.42979856,  1.10834965,  0.50134018]])
arr>0
array([[ True,  True, False, False],
       [ True, False,  True, False],
       [False, False, False,  True],
       [ True, False,  True,  True]])
#将所有正值替换成2,负值替换成-2
np.where(arr>0,2,-2)
array([[ 2,  2, -2, -2],
       [ 2, -2,  2, -2],
       [-2, -2, -2,  2],
       [ 2, -2,  2,  2]])
#将所有正值换成2
np.where(arr>0,2,arr)
array([[ 2.        ,  2.        , -0.90075114, -0.86810283],
       [ 2.        , -1.44098307,  2.        , -0.0863179 ],
       [-0.26002458, -0.44007831, -0.64002542,  2.        ],
       [ 2.        , -1.42979856,  2.        ,  2.        ]])

4.3.2 Mathematical and statistical methods

#生成数据
arr = np.random.randn(5,4)
arr
array([[-0.24008142, -0.08617688,  0.42879457, -1.05699554],
       [-0.86102647, -0.01481326, -0.49326453, -0.51728933],
       [-1.04369519, -0.07668856,  0.12641113, -0.34170659],
       [-0.34358427, -1.19146826,  0.79855649, -0.56526347],
       [ 0.34119469,  0.60338427,  0.23612535,  1.70667616]])
#平均值
arr.mean()
-0.1295455547355409
np.mean(arr)
-0.1295455547355409
#和
arr.sum()
-2.5909110947108185
#计算每一列的平均值
arr.mean(axis=1)
array([-0.23861482, -0.47159839, -0.3339198 , -0.32543988,  0.72184512])
#计算行轴向的和
arr.sum(axis=0)
array([-2.14719266, -0.76576269,  1.09662301, -0.77457876])
arr = np.array([0,1,2,3,4,5,6,7])
#从零开始元素累积和
arr.cumsum()
array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)
arr = np.array([[0,1,2],[3,4,5],[6,7,8]])
arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
arr.cumsum(axis=0)
array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)
#从1开始元素累积积
arr.cumprod(axis=1)
array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

4.3.3 Methods for Arrays of Boolean Values

arr = np.random.randn(100)
#计算正值的个数
(arr>0).sum()
51
bools = np.array([False,False,True,False])
bools.any()#是否至少有一个True
True
bools.all()#是否全部为True
False

4.3.4 Sorting

arr = np.random.randn(6)
arr
array([-0.28600425,  0.20138334,  0.61513703, -1.54104191,  0.71169457,
        1.28541225])
arr.sort()#排序
arr
array([-1.54104191, -0.28600425,  0.20138334,  0.61513703,  0.71169457,
        1.28541225])
arr = np.random.randn(5,3)
arr
array([[ 0.44551524,  0.22691436, -1.49874737],
       [ 0.36256785,  1.19204608,  0.31673416],
       [ 0.07827487,  0.64557507, -1.31371171],
       [-1.01458161, -0.82770194, -0.06353473],
       [-0.40078359,  2.48821946, -0.50991488]])
arr.sort(1)
arr
array([[-1.49874737,  0.22691436,  0.44551524],
       [ 0.31673416,  0.36256785,  1.19204608],
       [-1.31371171,  0.07827487,  0.64557507],
       [-1.01458161, -0.82770194, -0.06353473],
       [-0.50991488, -0.40078359,  2.48821946]])
#计算一个数组的分位数,并选出分位数所对应的值
large_arr = np.random.randn(1000)
large_arr.sort()
large_arr[int(0.05*len(large_arr))]
-1.7200330679547906

4.3.5 Unique Values ​​and Other Collection Logic

np.unique, returns the array formed by sorting the unique values ​​in the array

names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
np.unique(names)
array(['Bob', 'Joe', 'Will'], dtype='<U4')
ints = np.array([3,3,3,2,2,1,1,4,4])
np.unique(ints)
array([1, 2, 3, 4])

np.unique compared to pure python

sorted(set(names))
['Bob', 'Joe', 'Will']

np.in1d, which can check whether the value in one array is in another array and return an array of boolean values

values = np.array([6,0,0,3,2,5,6])
np.in1d(values,[2,3,6])
array([ True, False, False,  True,  True, False,  True])

4.4 Using arrays for file input and output

np.save and np.load are two tool functions for efficiently accessing hard disk data. Arrays are stored in an uncompressed format by default, and the suffix is ​​.npy.

arr = np.arange(10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.save('some_array',arr)
np.load('some_array.npy')
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.savez passes an array as an argument to this function and is used to save multiple arrays in an uncompressed file.

np.savez('array_archive.npz',a=arr,b=arr)
arch = np.load('array_archive.npz')
arch['a']
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arch['b']
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If the data is already compressed, you can use np.savez_compressed.

np.savez_compressed('arrays_compressed..npz',a=arr,b=arr)
arch1 = np.load('arrays_compressed..npz')
arch1['a']
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

4.5 Linear Algebra

x = np.array([[1.,2.,3.],[4.,5.,6.]])
y = np.array([[6.,23.],[-1,7],[8,9]])
x
array([[1., 2., 3.],
       [4., 5., 6.]])
y
array([[ 6., 23.],
       [-1.,  7.],
       [ 8.,  9.]])
x.dot(y)
array([[ 28.,  64.],
       [ 67., 181.]])

x.dot(y) is equivalent to np.dot(x,y)

np.dot(x,y)
array([[ 28.,  64.],
       [ 67., 181.]])
np.dot(x,np.ones(3))
array([ 6., 15.])

The special symbol @ is also used as an infix operator for dot multiplication matrix operations

x @ np.ones(3)
array([ 6., 15.])

numpy.linalg has a standard set of functions for matrix factorization, as well as other commonly used functions such as inversion and determinant solving

from numpy.linalg import inv, qr
X = np.random.randn(5,5)
mat = X.T.dot(X)
mat
array([[ 6.88097643, -0.40153042, -0.11773682,  4.82061317, -0.00948514],
       [-0.40153042,  2.93777143,  2.28436549, -3.33712964,  0.27895677],
       [-0.11773682,  2.28436549,  2.34334495, -1.8758072 ,  0.8700664 ],
       [ 4.82061317, -3.33712964, -1.8758072 ,  8.08801733, -1.40096259],
       [-0.00948514,  0.27895677,  0.8700664 , -1.40096259,  5.84629622]])
#求逆
inv(mat)
array([[  1.6344894 ,  -5.30599418,   3.59242482,  -2.48157756,
         -0.87347595],
       [ -5.30599418,  20.96406817, -14.76492044,   8.96595671,
          3.3369904 ],
       [  3.59242482, -14.76492044,  11.00957592,  -6.09349382,
         -2.38834458],
       [ -2.48157756,   8.96595671,  -6.09349382,   4.14309849,
          1.46783835],
       [ -0.87347595,   3.3369904 ,  -2.38834458,   1.46783835,
          0.71759003]])
mat.dot(inv(mat))
array([[ 1.00000000e+00, -3.62276349e-15,  2.97708758e-15,
         1.00365641e-15, -1.28563628e-15],
       [ 1.16986541e-15,  1.00000000e+00, -1.04489259e-15,
        -1.43234162e-15,  1.85031636e-16],
       [-9.17892019e-16,  5.90211635e-15,  1.00000000e+00,
         1.54913164e-15,  7.28854924e-16],
       [ 1.67430797e-15, -6.24584441e-15,  2.02034005e-15,
         1.00000000e+00, -1.38074650e-15],
       [ 6.79495103e-16, -4.09884887e-15,  3.43563276e-15,
        -1.61484598e-15,  1.00000000e+00]])
#计算QR分解
q,r = qr(mat)
r
array([[-8.41197519,  2.41336148,  1.31408829, -8.76534057,  0.8426876 ],
       [ 0.        , -4.40454207, -3.50602341,  5.0520601 , -1.60816204],
       [ 0.        ,  0.        , -0.98999659, -0.25669779, -3.68304671],
       [ 0.        ,  0.        ,  0.        , -1.68870316,  4.4795456 ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.22210084]])

4.6 Pseudo-random number generation

Use normal to get a 4*4 normal distribution

samples = np.random.normal(size=(4,4))
samples
array([[-1.08982894, -0.38664288,  0.08795078, -0.58766288],
       [-0.55362143,  0.53318817, -1.24544404, -0.28009587],
       [-0.62227897, -0.96513278,  0.94540138, -0.1743617 ],
       [-1.02020369,  0.44070475,  0.16880846,  1.32297271]])

Using numpy.random is an order of magnitude faster than the pure-python way of generating large samples

from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0,1) for _ in range(N)]
966 ms ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Compiler time: 0.15 s
%timeit np.random.normal(size=N)
29.7 ms ± 517 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
np.random.seed(1234)#更改随机数种子
#为了避免全局状态,可以使用numpy.random.RandomState创建一个随机数生成器,使数据独立于其他的随机数状态
rng = np.random.RandomState(1234)
rng.randn(10)
array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

4.7 Random walks

#1000步的随机漫步
import random
position = 0
walk = [position]
steps = 1000
for i in range(steps):
    step = 1 if random.randint(0,1) else -1
    position += step
    walk.append(position)
plt.plot(walk[:100])

insert image description here

#1000次随机投掷硬币的结果,每次结果为1或-1
nsteps = 1000
draws = np.random.randint(0,2,size=nsteps)
steps = np.where(draws>0,1,-1)
walk = steps.cumsum()
walk.min()
-9
walk.max()
60
plt.plot(walk[:100])
[<matplotlib.lines.Line2D at 0x2225fe86e20>]

insert image description here

#np.abs(walk)>=10表示连续在一个方向走了十步,argmax()可以返回布尔值数组中最大值的第一个位置(True就是最大值)
(np.abs(walk)>=10).argmax()
297

4.7.1 Simulating multiple random walks at once

#一次性跨行算出全部5000个随机步的累计和
nwalks = 5000
nsteps = 1000
draws = np.random.randint(0,2,size=(nwalks,nsteps))#0/1
steps = np.where(draws>0,1,-1)
walks = steps.cumsum(1)
walks
array([[  1,   2,   3, ...,  46,  47,  46],
       [  1,   0,   1, ...,  40,  41,  42],
       [  1,   2,   3, ..., -26, -27, -28],
       ...,
       [  1,   0,   1, ...,  64,  65,  66],
       [  1,   2,   1, ...,   2,   1,   0],
       [ -1,  -2,  -3, ...,  32,  33,  34]], dtype=int32)
plt.plot(walk[:100])
[<matplotlib.lines.Line2D at 0x2225fe82fd0>]

insert image description here

walks.max()
122
walks.min()
-128
#计算30的最小穿越时间
#使用any方法检查
hits30 = (np.abs(walks)>30).any(1)
hits30
array([ True,  True,  True, ...,  True, False,  True])
hits30.sum()#达到30的数字
3210
#选出绝对值步数超过30的步所在的行,并使用argmax从轴向1上获取穿越时间
crossing_times = (np.abs(walks[hits30])>=30).argmax(1)
crossing_times.mean()
501.89283489096573

Guess you like

Origin blog.csdn.net/weixin_43155435/article/details/126726705