ndarray
- Nadrray is an N-dimensional array object, a fast and flexible container for large data sets
- You can use this array (ndarray) to perform mathematical operations on the entire block of data
Import the code of the Nump library
import numpy as np
Generate some random data
data = np.random.randn(2, 3)
–numpy.random.randn()
- The randn function returns a sample or a set of samples with a standard normal distribution
- The standard normal distribution is also called the u distribution. It is a normal distribution with 0 as the mean and 1 as the standard deviation, denoted as N(0, 1)
--Dn means each dimension, randn(2, 3) means to return an array with 2 rows and 3 columns
Arrays can perform mathematical operations
- Multiply every element in the array by 10
data * 10
- Add two arrays, each element in the array is added correspondingly
data + data
Numpy is a universal multidimensional container of isomorphic data, in which all elements must be of the same type.
Each array has
- shape (a tuple representing the size of each dimension)
- dtype (an object used to describe the data type of an array)
Take a look at the shape and dtype of the data array
data.shape
It can be seen that data is an array with 2 rows and 3 columns
data.dtype
It can be seen that the data type of the data array is'float64'
Create ndarray
The array function can be used to create an ndarray array. It accepts all serial objects (including other arrays).
Take the conversion of a list as an example:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1
You can see that the list data1 is converted to the array arr1
If the list is composed of a set of equal-length lists, the array function will convert it into a multi-dimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2
It can be seen that the list consisting of two lists is converted into a 2-dimensional array
We can use the attributes ndim and shape to verify
arr2.ndim
-ndim returns the dimension of the array, only one number is returned, which represents the dimension of the array
arr2.shape
-If there is no special instructions, np.array will select the most matching data type for the data to be created
-such as arr1, arr2 in the above example, we can see the created data type
, if there are decimals in the list, the array created by array The data type is floating-point -->
if the list of'float64' is all integers, then the data type of the array created by array is integer -->'int32'
The --np.zeros() method can create an array of all 0s with a specified length or shape.
If a number 10 is passed in, then a one-dimensional array of 10 '0's will be created by default
np.zeros(10)
The data type of the created array is floating point
np.zeros(10).dtype
Pass in a tuple representing the shape, you can create a multi-dimensional array,
such as creating a two-dimensional array of all '0's with 3 rows and 6 columns
np.zeros((3, 6))
--Np.empty() method specifies an uninitialized array of length or shape, or you can pass in a tuple to create a multi-dimensional array,
such as creating a three-dimensional uninitialized array (2 * 3 rows and 2 columns)
np.empty((2, 3, 2))
Note: The idea of np.empty returning an array of all 0s is unsafe. In many cases, it returns some uninitialized garbage values.
np.arange()
--The arange in numpy is the array version of the python built-in function'range'. The
parameter N is passed in, which means to generate an integer array from 0 to (N-1)
For example, create an integer array of 0-14
np.arange(15)
np.arange(15).dtype
Note: Numpy focuses on numerical calculations. If not specified, the data types are basically'float64' (floating point numbers)
The following are some commonly used array creation functions, which are used frequently
- Array: Convert the data to ndarray (multidimensional array), if dtype is not specified, it will match the data type most suitable for the source data by default
- asarry: Convert data to ndarray (multidimensional array). The difference between it and array is --> When the source data is ndarray, array will
copy out a copy of ndarray, but asarray will not - arange: similar to the python built-in function range, but arange returns an ndarry, and the built-in range returns a list
- ones: Create an array of all '1's according to the specified shape and dtype, the default is'float64' floating point
- ones_like: Take another array as a parameter (get the shape of the array) and create an array of all '1's based on the shape of the parameter
- zeros, zeros_like: similar to ones and ones_like, but creates an array of all '0's
- empty, empty_like: similar to ones and ones_like, but it only allocates memory space, but does not fill any values (all uninitialized garbage values are created)
- full: Use all the values in the fill value to create an array based on the specified shape and dtype (simulate a set of fill_value here)
- full_like: Create an array of the same shape with the shape of another array, the array value is fill_value
- eye: Enter the parameter N, create a square N * N identity matrix (the diagonal is 1, the rest is 0), the array type is floating point
- identity: Same as np.eye()
dtype
-Dtype contains the information needed to interpret a piece of ndarry's memory as a specific data type
Common Numpy data types
- int8: type code: i1 --> signed 8-bit (1 byte) integer
- uint8: Type code: u1 --> unsigned 8-bit (1 byte) integer
- int16: type code: i2 --> signed 16-bit (2 bytes) integer
- uint16: Type code: u2 --> unsigned 16-bit (2 bytes) integer
- int32: Type code: i4 --> signed 32-bit (4 bytes) integer
- uint32: Type code: u4 --> unsigned 32-bit (4 bytes) integer
- int64: Type code: i8 --> signed 64-bit (8 bytes) integer
- uint64: Type code: u8 --> unsigned 64-bit (8 bytes) integer
- float16: Type code: f2 --> half precision floating point number
- float32: Type code: f4 or f --> standard single-precision floating-point number (compatible with C float)
- float64: Type code: f8 or d --> standard double-precision floating-point number (compatible with C double and Python float objects)
- float128: Type code: f16 or g --> extended precision floating point number
- complex64: Type code: c8 --> a complex number represented by two 32-bit floating point numbers
- complex128: Type code: c16 --> a complex number represented by two 64-bit floating point numbers
- complex256: Type code: c32 --> a complex number represented by two 128-bit floating point numbers
- bool: Type code:? --> Boolean type storing True and False
- object: Type code: O -->Python object type
- string_: Type code: S --> fixed-length string type (1 byte per character), for example, to create a string of length 10, you should use S10
- unicode_: Type code: U --> fixed-length unicode type (the number of bytes is determined by the platform)
– You can explicitly convert an array from one dtype to another dtype by using the astype method of ndarray.
Suppose there is an integer array arr
arr = np.array([1, 2, 3, 4, 5])
Convert arr to floating point
float_arr = arr.astype(np.float64)
If you convert a floating-point number to an integer, the fractional part will be truncated and deleted.
For example, there is a floating-point array arr2
arr2 = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
Convert to integer
int_arr2 = arr2.astype(np.int32)
You can see that the new array int_arr2 intercepts and deletes the decimal part of the original array
You can also use astype to convert a string array into a numeric form.
Suppose there is a string array numeric_strings
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype = np.string_)
- Note: When using np.string_ type, pay attention to the length of the string, because numpy's string data size is fixed, and no error will be reported when interception occurs
- If during the conversion process, the conversion of string --> numeric value fails (such as "one" cannot be converted to numeric value 1, a ValueError will be raised
-We can also pass the dtype of an array as a parameter to the astype method.
Suppose there is an integer array int_array and a floating-point array calibers
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype = np.float64)
If you want to convert int_array into the same floating-point array as calibers, you can pass the dtype of calibers into the parameter of the astype method
int_float = int_array.astype(calibers.dtype)
int_float.dtype
-You can also use the type code of the data type to represent dtype
empty_unit32 = np.empty(8, dtype = 'u4')
Numpy array operations
- Any arithmetic operation between arrays of equal size will apply the operation to the element level
Create a two-dimensional array arr with 2 rows and 3 columns
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr.shape
Multiply between arrays of equal size, and the elements are multiplied correspondingly
arr * arr
Subtract between arrays of equal size, and the elements are subtracted correspondingly
arr - arr
Arithmetic operations between arrays and scalars will propagate scalar value operations to each element
1 / arr
Comparison between arrays of the same size will produce a boolean array.
Let us create a two-dimensional array arr with the same size as arr
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
– Compare arr and arr2
arr2 > arr
You can see that a Boolean array of the same size is generated, and each element in the array is a comparison of arr and arr2.
Basic indexing and slicing
One-dimensional array slice
- On the surface, one-dimensional array slicing is similar to Python's list slicing function
Create a 0-9 integer array arr
arr = np.arange(10)
- Take the sixth element in arr
arr[5]
- Take the 6th to 8th elements in arr, close before slicing and then open
arr[5: 8]
- You can assign a value to the sliced part, and the source array is modified in place.
If I assign the 6th to the 8th number of arr with '12'
arr[5: 8] = 12
As you can see, the arr array is modified in situ
-so the most important difference between array slice and python list slice is: array slice is a view of the original data, which means that the data will not be copied, and any modification on the view will directly reflect Onto the source array
For example, create a slice of arr_slice
arr_slice = arr[5: 8]
- Note: If we modify the value of arr_slice, the change will be reflected in the original array arr
For example, assign the second number of arr_slice to '12345'
arr_slice[1] = 12345
As you can see, the data of the source array is also modified
– Slice [:] means to cut all the values in the array. For
example, we assign a value of '64' to each tuple in arr_slice
arr_slice[:] = 64
It can be seen that each element in arr_slice is assigned a value of '64'
- The reason why the above operation is different from python native slicing is that Numpy is designed to handle big data. If the data is copied and copied, it will put a lot of pressure on performance and memory.
- Of course, if you want to get a slice copy of the ndarray (array) instead of the view, you can add a copy() method after the slice
For example, we want to get a slice copy of the 6th to 8th elements of arr and assign it with '6'
arr3 = arr[5: 8].copy()
arr3[:] = 6
As you can see, operating on the copy of the array slice does not affect the source array
Two-dimensional array
-In a two-dimensional array, the elements at each index position are no longer scalars like one-dimensional arrays, but one-dimensional arrays
For example, create a two-dimensional array arr2d
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Take the third element of arr2d
arr2d[2]
The result is a one-dimensional array
– If you want to get a single scalar element, you can recursively access the high-dimensional elements. For
example, I want to get the third scalar of the first one-dimensional array of a two-dimensional array
arr2d[0, 2]
The idea of hierarchical recursion is used here
Multidimensional Arrays
– The principle of multi-dimensional arrays is similar to that of two-dimensional arrays
We create a 2 * 2 * 3 three-dimensional array arr3d
arr3d = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
- If you slice arr3d and pass in a scalar parameter, then what is returned is a two-dimensional array reduced by one dimension
For example, I want to get the first two-dimensional array of arr3d
arr3d[0]
As you can see, what is returned is a 2 * 3 two-dimensional array.
As the slice parameter increases, the dimension of the returned array decreases, until a scalar value is returned.
– Both scalar values and arrays can be assigned to arr3d[0]
Before showing the assignment, we first create a copy of arr3d[0] old_values in order to restore the source array
old_values = arr3d[0].copy()
Then, we assign the first two-dimensional array of arr3d this three-dimensional array to 42
arr3d[0] = 42
As you can see, each element in the first two-dimensional array obtained by slicing is assigned a value of 42
We use the original copy to restore the source array
arr3d[0] = old_values
- Note that if there is no special treatment, the slice returns a view of the source array, and modifying the slice will affect the value of the source array
Slice index
– The slicing syntax of ndarray is similar
to that of one-dimensional objects such as Python lists. Look at the previous one-dimensional array arr, we take the second to sixth elements of arr
arr[1:6]
For a 3 * 3 two-dimensional array arr2d, the slice selects elements along one axis, the 0th axis is the row, and the 1st axis is the column.
We select the first two rows of the two-dimensional array
arr2d[:2]
It can be seen here that arr2d[: 2] is a simple way of writing arr2d[0: 2], which means that the first two lines of arr2d are selected (front closed and then open)
-It is also possible to pass in multiple slices at once. For example, we need to select all the data after the second column in the first two rows of arr2d array
arr2d[ :2, 1: ]
- By mixing integer indexes and slices, you can get low-dimensional slices.
For example, I want to select the first two columns of the second row.
arr2d[1, :2]
We get a one-dimensional array of lower dimensions
- Note: A single colon indicates that the entire axis is selected
- Of course, the assignment operation of the slice will also be spread to the entire selection, because the slice is the view after the source array is processed, and the source array will be affected by the change of the slice
Boolean index
– Suppose, we have an array data for storing data and an array names for storing names (containing duplicates)
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
We use the randn function in numpy.random to generate a standard normal distribution with 7 rows and 4 columns random array data
data = np.random.randn(7, 4)
- Suppose, each name in the names array corresponds to each row in the data value.
We want to select all rows corresponding to the name of'Bob'
– Let’s take a look at which names in names are'Bob'
names == 'Bob'
- Here, you must use the'==' sign instead of the'=' sign, otherwise you will assign'Bob' to each element in names
- It can be seen that a one-dimensional Boolean array is generated, and the element with the element'Bob' returns True, otherwise it is False
-Next, we pass the returned boolean array as an index into data
data[names == 'Bob']
It is not difficult to find that the rows corresponding to'Bob' in data have been selected
- Note: The length of the Boolean array must be the same as the length of the indexed axis. If the length is inconsistent, an error will occur
-Of course, you can also index more axes, such as adding a column index, I want to get the data after the second column of the row corresponding to'Bob' in data
data[names == 'Bob', 2: ]
– If you want to select a value other than'Bob', you can use the inequality sign'! = 'or'~' to negate
data[names != 'Bob']
As a result, the remaining rows except the row corresponding to Bob are selected
– We often use the'~' operator to perform some conditional inversion. For
example, we first pass the boolean array of'Bob' to an object cond
cond = names == 'Bob'
Then use the ~ operator to reverse the object into the index of data
data[~cond]
As a result, the remaining rows except the row corresponding to Bob are also selected
-If we want to add judgment conditions, we can use & (and), | (or), and other Boolean arithmetic operators (you cannot use the keywords and and or in Python). For
example, I want to select'Bob at the same time The rows corresponding to the two names of 'and'Will' in data
mask = (names == 'Bob') | (names == 'Will')
data[mask]
As a result, the rows corresponding to Bob and Will are selected
- Note: Selecting data in the array by Boolean index will always create a view of the data
– We often set values through boolean arrays. For
example, set all negative values in data to 0
data[data < 0] = 0
-You can also set the value of an entire row or column through a one-dimensional Boolean array
data[names != 'Joe'] = 7
As you can see, the data corresponding to the elements in the row whose name is'Joe' are all assigned the value 7
Fancy Index
- Fancy indexing refers to: indexing with integer arrays
– Suppose there is an 8 * 4 array arr
arr = np.empty((8, 4))
This array is just to create space, and it contains uninitialized garbage values
-Now we fill this array with a for loop
for i in range(8):
arr[i] = i
We can pass in a list of integers in a specified order or select a subset of the array
arr[[4, 3, 0, 6]]
Array transposition and axis conversion
- Transpose is similar to array reshaping
- Transpose does not perform any copy operations, and returns a view of the source data
- The array has a special attribute T for transpose
Create a two-dimensional array with 3 rows and 5 columns
arr = np.arange(15).reshape((3, 5))
Transpose with attribute T
arr.T
- When performing matrix calculations (such as calculating the inner product of a matrix), transpose is often needed
- To calculate the inner product of the matrix, we can use the dot function in numpy
- In fact, dot() returns the dot product of two arrays
– We create a two-dimensional array and try to calculate the inner product of it and its transposed array
arr = np.random.randn(6, 3)
Calculate inner product
np.dot(arr, arr.T)
-As for the transposition of a three-digit array, a concept must be introduced here: transpose requires a tuple consisting of axis numbers.
For example, create a 2 * 2 * 4 three-dimensional array
arr = np.arange(16).reshape((2, 2, 4))
- Number the three axes of the three-dimensional array: 0, 1, 2 I imagine the length (0), width (1), and height (2) of a cuboid
- When transposed, it is equivalent to the horizontal rotation of the cuboid by 90 degrees, and the values of length and width are subjectively understood as interchange: the original length becomes width, and the width becomes length, so the length, width and height of the transposed cuboid correspond to The axis numbers of the original cuboid are 1 (width), 0 (length), 2 (height)
arr.transpose((1, 0, 2))
– Let’s talk about the transpose function in numpy. The parameters in transpose can be understood as the axis labels of the array
- For one-dimensional arrays, numpy.transpose() does not work because there is only one axis
- The transpose operation on the two-dimensional array is the transpose operation on the original array, and the axis label is converted from (0, 1) to (1, 0)
- For a three-dimensional array, transpose will transform two of the three axes (see how you define this transpose)
-There is also a swapaxes method in ndarray, which can exchange axes.
For example, I want to exchange the second and third axes of a three-dimensional array
arr.swapaxes(1, 2)
- The swapaxes method is actually a different way to transpose, which is very convenient
- It should also be noted that the swapaxes method does not perform a copy operation, and returns a view of the source data