In fact, before that, I wanted to learn machine learning, but my data analysis knowledge couldn't support it, especially after reading Mr. Li Hongyi's class, I decided to do this column first. This column is my According to the MOOC given by Mr. Song Tian of BIT, you can collect the content summarized for the second time as a document to be queried when you forget it. Of course, it is also a pretty good learning material if you are new to it. A very important suggestion is to view it on the PC side, the effect is better! !
Introduction to Content
- Dimension of data: one-dimensional, two-dimensional, multi-dimensional, high-dimensional.
- ndarry type properties, creation and transformation
Attributes | create | transform |
.it's me | np.arange(n) | .reshape(shape) |
.shape | np.ones(shape) | .resize(shape) |
.size | np.zeros(shape) | .swapaxes(ax1,ax2) |
.dtype | np.full(shape,val) | .flatten() |
.itemsize | np.eye(n) | |
np.ones_like(a) | ||
np.zeros_like(a) | ||
np.full_like(a,val) |
- Indexing and Slicing of Arrays
- Array operations: unary functions, binary functions
dimension of data
-
From a data to a set of data
For example: 3.14 is a data expressing a meaning; 3.1404, 3.1413, 3.1398, 3.1401, 3.1378, 3.1352 are a set of data expressing one or more meanings
-
Dimension: how a set of data is organized
Then the above data 3.1404, 3.1413, 3.1398, 3.1401, 3.1378, 3.1352
It has the following organizational forms:
"""
1、3.1404,3.1413,3.1398,3.1401,3.1378,3.1352
或是
2、3.1404,3.1413,3.1398
3.1401,3.1378,3.1352
"""
-
one-dimensional data
One-dimensional data consists of ordered or unordered data in peer-to-peer relationships, organized in a linear fashion.
3.1404, 3.1413, 3.1398, 3.1401, 3.1378, 3.1352
Corresponds to concepts such as lists, arrays, and sets.
-
Lists and arrays
ordered structure of a set of data
the difference
Array: same data type
3.1413, 3.1398, 3.1404, 3.1401, 3.1349, 3.1376
Lists: data types can be different
3.1413, 'pi', 3.1404, [3.1401, 3.1349], '3.1376'
-
2D data
Two-dimensional data is composed of multiple one-dimensional data, which is a combined form of one-dimensional data.
Tables are typical two-dimensional data, where the header is part of the two-dimensional data.
-
multidimensional data
Multidimensional data is formed by extending one-dimensional or two-dimensional data in new dimensions
As shown in the figure, the original table looks two-dimensional, but combined with the tables of different years, a time dimension is added, so it is three-dimensional.
-
high-dimensional data
High-dimensional data uses only the most basic binary relationships to display complex structures between data, such as key-value pairs.
{ “firstName” : “Tian” ,
“lastName” : “Song” ,
“address” : { “streetAddr” : “中关村南大街5号” ,
“city” : “北京市” ,
“zipcode” : “100081” } ,
“prof” : [ “Computer System” , “Security” ] }
-
python representation of data dimensions
1D Data: List and Collection Types
[3.1398, 3.1349, 3.1376] ordered
{3.1398, 3.1349, 3.1376} disorder
2D Data: List Type
Multidimensional Data: List Types
[ [3.1398, 3.1349, 3.1376], [3.1413, 3.1404, 3.1401] ]
High-dimensional data: dictionary type or data representation format JSON, XML and YAML format
dict = { “firstName” : “Tian”, “lastName” : “Song” }
NumPy array objects: ndarray
#####NumPy Introduction######
It is an open source Python scientific computing base library that includes:
- A powerful N-dimensional array object ndarray
- Broadcast function functions • Tools for integrating C/C++/Fortran code
- Linear algebra, Fourier transform, random number generation and other functions
NumPy is the foundation of data processing or scientific computing libraries such as SciPy and Pandas.
We know that Python already has a list type, why do we need an array object (type)?
I had the same doubt at the beginning, in fact, the list is still very useful, but it is only suitable for less content programming. Let's take a look at these two pieces of code:
def npSum():
a = [0, 1, 2, 3, 4]
b = [9, 8, 7, 6, 5]
c=[]
for i in range(len(a)):
c.append(a[i] ** 2 + b[i] ** 3)
return c
print(npSum())
Array objects can remove the looping required for operations between elements, making 1D vectors more like a single piece of data
import numpy as np
def npSum():
a=np.array([0,1,2,3,4])
b=np.array([9,8,7,6,5])
c = a**2 + b**2
return c
print(npSum())
They both get: [729, 513, 347, 225, 141]
- Set a special array object, which can be optimized to improve the operation speed of such applications;
- In scientific computing, all data types in a dimension are often the same;
- Array objects use the same data type, which helps save computing and storage space;
ndarray is a multidimensional array object consisting of two parts:
- actual data
- Metadata describing these data (data dimensions, data types, etc.)
ndarray arrays generally require that all elements have the same type, and the array subscript starts from 0.
Next, let's get to the point:
-
The use of ndarray
In [14]:a=np.array([[0,1,2,3,4],
... [5,6,7,8,9]])
In [15]: a
Out[15]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [16]:print(a)
Out[15]:
[[0 1 2 3 4]
[5 6 7 8 9]]
The output of np.array() is in the form of [], the elements are separated by spaces, axis (axis): the dimension of the saved data; rank (rank): the number of axes.
-
Properties of ndarray objects
Attributes | illustrate |
---|---|
.it's me | rank, i.e. the number of axes or the number of dimensions |
.shape | the scale of the ndarray object, for matrices, n rows and m columns |
.size | The number of elements of the ndarray object, equivalent to the value of n*m in .shape |
.dtype | the element type of the ndarray object |
.itemsize | the size of each element in the ndarray object, in bytes |
In [19]:a=np.array([[0,1,2,3,4],
... [5,6,7,8,9]])
In [20]:a.ndim
Out[20]: 2
In [21]:a.shape
Out[21]: (2, 5)
In [22]:a.size
Out[22]: 10
In [23]:a.dtype
Out[23]: dtype('int32')
In [24]:a.itemsize
Out[24]: 4
-
the element type of the ndarray array
type of data | Description (1) |
---|---|
bool | Boolean type, True or False |
intc | Consistent with the int type in C language, generally int32 or int64 |
intp | Integer for indexing, consistent with ssize_t in C, int32 or int64 |
you8 | Integer of length in bytes, value: [‐128, 127] |
int16 | 16-bit integer, value: [‐32768, 32767] |
int32 | 32-bit integer, value: [‐2**31, 2**31‐1] |
int64 | 64-bit integer, value: [‐2**63, 2**63‐1] |
type of data | Description (2) |
---|---|
uint8 | 8-bit unsigned integer, value: [0, 255] |
uint16 | 16-bit unsigned integer, value: [0, 65535] |
uint32 | 32-bit unsigned integer, value: [0, 2**32‐1] |
uint64 | 64-bit unsigned integer, value: [0, 2**64‐1] |
float16 | 16-bit half-precision floating-point number: 1 sign bit, 5 exponent bits, 10 mantissa bits |
float32 | 32-bit half-precision floating-point numbers: 1 sign bit, 8 bits exponent, 23 bits mantissa |
float64 | 64-bit half-precision floating-point numbers: 1 sign bit, 11 exponent bits, 52 bits mantissa |
type of data | Description (3) |
---|---|
complex64 | Complex type, both real and imaginary parts are 32-bit floating point numbers |
complex128 | 复数类型,实部和虚部都是64位浮点数 |
如上,ndarray有很多的数据类型。
与Python语法相比,其仅支持整数、浮点数和复数3种类型
- 科学计算涉及数据较多,对存储和性能都有较高要求;
- 对元素类型精细定义,有助于NumPy合理使用存储空间并优化性能;
- 对元素类型精细定义,有助于程序员对程序规模有合理评估;
在这里,我们需要提到非同质的ndarray对象,它无法有效发挥NumPy优势,应当尽量避免使用。
In [27]: x=np.array([[0,1,2,3,4],
...: [5,6,7,8]])
...:
In [28]: x.shape
Out[28]: (2,)
In [29]: x.dtype
Out[29]: dtype('O')
In [30]: x
Out[30]: array([list([0, 1, 2, 3, 4]), list([5, 6, 7, 8])], dtype=object)
In [31]: x.itemsize
Out[31]: 8
In [32]: x.size
Out[32]: 2
-
ndarray数组的创建
创建方法:
- 从Python中的列表、元组等类型创建ndarray数组
- 使用NumPy中函数创建ndarray数组,如:arange, ones, zeros等
- 从字节流(raw bytes)中创建ndarray数组
- 从文件中读取特定格式,创建ndarray数组
(1) 从Python中的列表、元组等类型创建ndarray数组
x = np.array(list/tuple)
#或是
x = np.array(list/tuple,dtype=np.float32)
当np.array()不指定dtype时,NumPy将根据数据情况关联一个dtype类型
(2) 使用NumPy中函数创建ndarray数组,如:arange, ones, zeros等
函数 | 说明 |
---|---|
np.arange(n) | 类似range()函数,返回ndarray类型,元素从0到n‐1 |
np.ones(shape) | 根据shape生成一个全1数组,shape是元组类型 |
np.zeros(shape) | 根据shape生成一个全0数组,shape是元组类型 |
np.full(shape,val) | 根据shape生成一个数组,每个元素值都是val |
np.eye(n) | 创建一个正方的n*n单位矩阵,对角线为1,其余为0 |
np.ones_like(a) | 根据数组a的形状生成一个全1数组 |
np.zeros_like(b) | 根据数组b的形状生成一个全0数组 |
np.full_like(a,val) | 根据数组a的形状生成一个数组,每个元素值都是val |
(3) 使用NumPy中其他函数创建ndarray数组
函数 | 说明 |
---|---|
np.linspace() | 根据起止数据等间距地填充数据,形成数组 |
np.concatenate() | 将两个或多个数组合并成一个新的数组 |
在这里,endpoint=False指的是最后一个数10不可取。
-
ndarray数组的变换
对于创建后的ndarray数组,可以对其进行维度变换和元素类型变换。
(1) ndarray数组的维度变换
方法 | 说明 |
---|---|
.reshape(shape) | 不改变数组元素,返回一个shape形状的数组,原数组不变 |
.resize(shape) | 与.reshape()功能一致,但修改原数组 |
.swapaxes(ax1,ax2) | 将数组n个维度中两个维度进行调换 |
.flatten() | 对数组进行降维,返回折叠后的一维数组,原数组不变 |
(2) ndarray数组的类型变换
astype()方法一定会创建新的数组(原始数据的一个拷贝),即使两个类型一致。
除此之外,还可以通过ls = a.tolist()方法将ndarray数组向列表的转换。
ndarray数组的操作
-
数组的索引和切片
索引:获取数组中特定位置元素的过程
切片:获取数组元素子集的过程
- 一维数组的索引和切片:与Python的列表类似
- 多维数组的索引:
- 多维数组的切片:
有 “:” 时,我们就把他当作不考虑这个维度。
ndarray数组的运算
-
数组与标量之间的运算
数组与标量之间的运算作用于数组的每一个元素
-
一元函数
函数 | 说明 |
---|---|
np.abs(x) np.fabs(x) | 计算数组各元素的绝对值 |
np.sqrt(x) | 计算数组各元素的平方根 |
np.square(x) | Calculate the square of each element of an array |
np.log(x) np.log10(x) np.log2(x) | Calculates the natural logarithm, base 10 logarithm, and base 2 logarithm of each element of an array |
np.ceil(x) np.floor(x) | Calculate the ceiling value or floor value of each element of the array |
np.rint(x) | Calculates the rounded value of each element of an array |
np.modf(x) | Returns the fractional and integer parts of each element of an array as two separate arrays |
np.cosh(x) np.sin(x) np.sinh(x) np.tan(x) np.tanh(x) np.exp(x) | Compute ordinary and hyperbolic trigonometric functions for elements of an array |
np.exp(x) | Calculate the exponent value of each element of an array |
np.sign(x) | Calculate the sign value of each element of the array, 1(+), 0, ‐1(‐) |
-
binary function
function | illustrate |
---|---|
+ ‐* / ** | Perform corresponding operations on each element of two arrays |
np.maximum(x,y) np.fmax() np.minimum(x,y) np.fmin() |
Element-wise max/min calculation |
np.mod(x,y) | Element-wise modulo operations |
np.copysign(x,y) | Assign the sign of the value of each element in the array y to the corresponding element of the array x |
> < >= <= == != | Arithmetic comparison, yielding boolean array |
Functions that perform element-wise operations on data in an ndarray.