Data Analysis: Unit 1 Getting Started with the NumPy Library

In fact, before that, I wanted to learn machine learning, but my data analysis knowledge couldn't support it, especially after reading Mr. Li Hongyi's class, I decided to do this column first. This column is my According to the MOOC given by Mr. Song Tian of BIT, you can collect the content summarized for the second time as a document to be queried when you forget it. Of course, it is also a pretty good learning material if you are new to it. A very important suggestion is to view it on the PC side, the effect is better! !

Introduction to Content

  • Dimension of data: one-dimensional, two-dimensional, multi-dimensional, high-dimensional. 
  • ndarry type properties, creation and transformation
  Attributes         create            transform
.it's me np.arange(n)  .reshape(shape)
.shape np.ones(shape) .resize(shape)
.size np.zeros(shape) .swapaxes(ax1,ax2)
.dtype np.full(shape,val) .flatten()
.itemsize np.eye(n)
np.ones_like(a)
np.zeros_like(a)
np.full_like(a,val)
  • Indexing and Slicing of Arrays
  • Array operations: unary functions, binary functions 

dimension of data

  • From a data to a set of data

For example: 3.14 is a data expressing a meaning; 3.1404, 3.1413, 3.1398, 3.1401, 3.1378, 3.1352 are a set of data expressing one or more meanings

  • Dimension: how a set of data is organized

Then the above data 3.1404, 3.1413, 3.1398, 3.1401, 3.1378, 3.1352

It has the following organizational forms:

"""
1、3.1404,3.1413,3.1398,3.1401,3.1378,3.1352
或是
2、3.1404,3.1413,3.1398
   3.1401,3.1378,3.1352
"""

  • one-dimensional data 

One-dimensional data consists of ordered or unordered data in peer-to-peer relationships, organized in a linear fashion.

3.1404, 3.1413, 3.1398, 3.1401, 3.1378, 3.1352

Corresponds to concepts such as lists, arrays, and sets.

  • Lists and arrays

ordered structure of a set of data

the difference

Array: same data type

3.1413, 3.1398, 3.1404, 3.1401, 3.1349, 3.1376

Lists: data types can be different

3.1413, 'pi', 3.1404, [3.1401, 3.1349], '3.1376'

  • 2D data

Two-dimensional data is composed of multiple one-dimensional data, which is a combined form of one-dimensional data.

Tables are typical two-dimensional data, where the header is part of the two-dimensional data.

cbab1bc340ca465cb048727a63a3e650.png

  • multidimensional data

Multidimensional data is formed by extending one-dimensional or two-dimensional data in new dimensions

9d5638f4c45e4a9a96b43ef7e5776b9e.png

 As shown in the figure, the original table looks two-dimensional, but combined with the tables of different years, a time dimension is added, so it is three-dimensional.

  • high-dimensional data

High-dimensional data uses only the most basic binary relationships to display complex structures between data, such as key-value pairs.

{ “firstName” : “Tian” ,
  “lastName”  : “Song” , 
  “address”  : { “streetAddr” : “中关村南大街5号” , 
                 “city”  : “北京市” , 
                 “zipcode”  : “100081” } , 
  “prof”  : [ “Computer System” , “Security” ] }

  • python representation of data dimensions

1D Data: List and Collection Types

        [3.1398, 3.1349, 3.1376] ordered

        {3.1398, 3.1349, 3.1376} disorder

2D Data: List Type

Multidimensional Data: List Types 

        [ [3.1398, 3.1349, 3.1376], [3.1413, 3.1404, 3.1401] ]

High-dimensional data: dictionary type or data representation format JSON, XML and YAML format

        dict = { “firstName” : “Tian”, “lastName”  : “Song” }


NumPy array objects: ndarray

#####NumPy Introduction######

It is an open source Python scientific computing base library that includes:

  • A powerful N-dimensional array object ndarray
  • Broadcast function functions • Tools for integrating C/C++/Fortran code
  • Linear algebra, Fourier transform, random number generation and other functions

NumPy is the foundation of data processing or scientific computing libraries such as SciPy and Pandas.

We know that Python already has a list type, why do we need an array object (type)?

I had the same doubt at the beginning, in fact, the list is still very useful, but it is only suitable for less content programming. Let's take a look at these two pieces of code:

def npSum():
    a = [0, 1, 2, 3, 4]
    b = [9, 8, 7, 6, 5]
    c=[]
    for i in range(len(a)):
        c.append(a[i] ** 2 + b[i] ** 3)

    return c


print(npSum())

 Array objects can remove the looping required for operations between elements, making 1D vectors more like a single piece of data

import numpy as np

def npSum():
    a=np.array([0,1,2,3,4])
    b=np.array([9,8,7,6,5])
    
    c = a**2 + b**2
    
    return c
print(npSum())

They both get: [729, 513, 347, 225, 141] 

  1. Set a special array object, which can be optimized to improve the operation speed of such applications;
  2. In scientific computing, all data types in a dimension are often the same;
  3. Array objects use the same data type, which helps save computing and storage space;

ndarray is a multidimensional array object consisting of two parts:

  •  actual data
  •  Metadata describing these data (data dimensions, data types, etc.) 

ndarray arrays generally require that all elements have the same type, and the array subscript starts from 0.

Next, let's get to the point:

  • The use of ndarray

In [14]:a=np.array([[0,1,2,3,4],
    ...            [5,6,7,8,9]])
In [15]: a
Out[15]: 
array([[0, 1, 2, 3, 4],
      [5, 6, 7, 8, 9]])
In [16]:print(a)
Out[15]:
[[0 1 2 3 4]
 [5 6 7 8 9]]

The output of np.array() is in the form of [], the elements are separated by spaces, axis (axis): the dimension of the saved data; rank (rank): the number of axes. 

  • Properties of ndarray objects

     Attributes                                         illustrate
.it's me rank, i.e. the number of axes or the number of dimensions
.shape the scale of the ndarray object, for matrices, n rows and m columns
.size The number of elements of the ndarray object, equivalent to the value of n*m in .shape
.dtype the element type of the ndarray object
.itemsize the size of each element in the ndarray object, in bytes

In [19]:a=np.array([[0,1,2,3,4],
    ...            [5,6,7,8,9]])
In [20]:a.ndim
Out[20]: 2

In [21]:a.shape
Out[21]: (2, 5)

In [22]:a.size
Out[22]: 10

In [23]:a.dtype
Out[23]: dtype('int32')

In [24]:a.itemsize
Out[24]: 4

  • the element type of the ndarray array 

   type of data                                    Description (1)
bool Boolean type, True or False
intc Consistent with the int type in C language, generally int32 or int64
intp Integer for indexing, consistent with ssize_t in C, int32 or int64
you8 Integer of length in bytes, value: [‐128, 127] 
int16 16-bit integer, value: [‐32768, 32767]
int32 32-bit integer, value: [‐2**31, 2**31‐1]
int64 64-bit integer, value: [‐2**63, 2**63‐1]
   type of data                                   Description (2)
uint8 8-bit unsigned integer, value: [0, 255]
uint16 16-bit unsigned integer, value: [0, 65535]
uint32 32-bit unsigned integer, value: [0, 2**32‐1]
uint64 64-bit unsigned integer, value: [0, 2**64‐1]
float16 16-bit half-precision floating-point number: 1 sign bit, 5 exponent bits, 10 mantissa bits
float32 32-bit half-precision floating-point numbers: 1 sign bit, 8 bits exponent, 23 bits mantissa
float64 64-bit half-precision floating-point numbers: 1 sign bit, 11 exponent bits, 52 bits mantissa
type of data                                  Description (3)
complex64 Complex type, both real and imaginary parts are 32-bit floating point numbers
complex128 复数类型,实部和虚部都是64位浮点数

如上,ndarray有很多的数据类型。

与Python语法相比,其仅支持整数、浮点数和复数3种类型

  1. 科学计算涉及数据较多,对存储和性能都有较高要求;
  2. 对元素类型精细定义,有助于NumPy合理使用存储空间并优化性能;
  3. 对元素类型精细定义,有助于程序员对程序规模有合理评估;

 在这里,我们需要提到非同质的ndarray对象,它无法有效发挥NumPy优势,应当尽量避免使用。

In [27]: x=np.array([[0,1,2,3,4],
    ...:              [5,6,7,8]])
    ...:         
In [28]: x.shape
Out[28]: (2,)

In [29]: x.dtype
Out[29]: dtype('O')

In [30]: x
Out[30]: array([list([0, 1, 2, 3, 4]), list([5, 6, 7, 8])], dtype=object)

In [31]: x.itemsize
Out[31]: 8

In [32]: x.size
Out[32]: 2

  •  ndarray数组的创建

创建方法:

  • 从Python中的列表、元组等类型创建ndarray数组
  • 使用NumPy中函数创建ndarray数组,如:arange, ones, zeros等
  • 从字节流(raw bytes)中创建ndarray数组
  • 从文件中读取特定格式,创建ndarray数组

(1)  从Python中的列表、元组等类型创建ndarray数组

x = np.array(list/tuple)
#或是
x = np.array(list/tuple,dtype=np.float32)

 当np.array()不指定dtype时,NumPy将根据数据情况关联一个dtype类型

224fd77f21444a6ab2086693432ee956.png

 (2)  使用NumPy中函数创建ndarray数组,如:arange, ones, zeros等

          函数                                     说明
np.arange(n) 类似range()函数,返回ndarray类型,元素从0到n‐1
np.ones(shape) 根据shape生成一个全1数组,shape是元组类型
np.zeros(shape) 根据shape生成一个全0数组,shape是元组类型
np.full(shape,val) 根据shape生成一个数组,每个元素值都是val
np.eye(n) 创建一个正方的n*n单位矩阵,对角线为1,其余为0
np.ones_like(a) 根据数组a的形状生成一个全1数组
np.zeros_like(b) 根据数组b的形状生成一个全0数组
np.full_like(a,val) 根据数组a的形状生成一个数组,每个元素值都是val

fd54fb3f94664f64a4f8a9ce055af545.png

 (3)  使用NumPy中其他函数创建ndarray数组

            函数                                  说明
np.linspace() 根据起止数据等间距地填充数据,形成数组
np.concatenate() 将两个或多个数组合并成一个新的数组

在这里,endpoint=False指的是最后一个数10不可取。

  • ndarray数组的变换

对于创建后的ndarray数组,可以对其进行维度变换和元素类型变换。

 (1)  ndarray数组的维度变换

            方法                                     说明
.reshape(shape) 不改变数组元素,返回一个shape形状的数组,原数组不变
.resize(shape) 与.reshape()功能一致,但修改原数组
.swapaxes(ax1,ax2) 将数组n个维度中两个维度进行调换
.flatten() 对数组进行降维,返回折叠后的一维数组,原数组不变

 

 (2)  ndarray数组的类型变换 

 

astype()方法一定会创建新的数组(原始数据的一个拷贝),即使两个类型一致。

除此之外,还可以通过ls = a.tolist()方法将ndarray数组向列表的转换

 


ndarray数组的操作 

  • 数组的索引和切片

索引:获取数组中特定位置元素的过程

切片:获取数组元素子集的过程

  •  一维数组的索引和切片:与Python的列表类似

  •  多维数组的索引:

  •  多维数组的切片:

有 “:” 时,我们就把他当作不考虑这个维度。


 ndarray数组的运算

  • 数组与标量之间的运算

数组与标量之间的运算作用于数组的每一个元素 

  • 一元函数

函数 说明
np.abs(x)  np.fabs(x) 计算数组各元素的绝对值
np.sqrt(x) 计算数组各元素的平方根
np.square(x) Calculate the square of each element of an array
np.log(x)  np.log10(x) np.log2(x) Calculates the natural logarithm, base 10 logarithm, and base 2 logarithm of each element of an array
np.ceil(x)  np.floor(x) Calculate the ceiling value or floor value of each element of the array
np.rint(x) Calculates the rounded value of each element of an array
np.modf(x) Returns the fractional and integer parts of each element of an array as two separate arrays
np.cosh(x) np.sin(x) np.sinh(x) np.tan(x) np.tanh(x) np.exp(x) Compute ordinary and hyperbolic trigonometric functions for elements of an array
np.exp(x) Calculate the exponent value of each element of an array
np.sign(x) Calculate the sign value of each element of the array, 1(+), 0, ‐1(‐)

  •  binary function

function illustrate
+ ‐* / ** Perform corresponding operations on each element of two arrays

np.maximum(x,y) np.fmax()

np.minimum(x,y)  np.fmin()

Element-wise max/min calculation
np.mod(x,y) Element-wise modulo operations
np.copysign(x,y) Assign the sign of the value of each element in the array y to the corresponding element of the array x
> < >= <= == != Arithmetic comparison, yielding boolean array

 Functions that perform element-wise operations on data in an ndarray.

Guess you like

Origin blog.csdn.net/m0_62919535/article/details/126895953