Data Analysis: Unit 2 NumPy Data Access and Functions

This unit mainly revolves around NumPy data access and functions. We will learn a lot of functions, which are basically functions provided by NumPy. The specific content description is recorded in the table, but the code example needs to be done by yourself. Knock because I am providing pictures.


Table of contents

Introduction to Content


CSV file access of data

  • CSV file

CSV (Comma‐Separated Value, comma separated value)

It is a common file format used to store bulk data.

 how to create

np.savetxt(frame, array,  fmt='%.18e', delimiter=None) 
• frame  :  文件、字符串或产生器,可以是.gz或.bz2的压缩文件 
• array  :  存入文件的数组 
• fmt :  写入文件的格式,例如:%d %.2f %.18e 
• delimiter  :  分割字符串,默认是任何空格
np.loadtxt(frame,  dtype=np.float,  delimiter=None, unpack=False) 
• frame  :  文件、字符串或产生器,可以是.gz或.bz2的压缩文件 
• dtype :  数据类型,可选 
• delimiter  :  分割字符串,默认是任何空格 
• unpack  :  如果True,读入属性将分别写入不同变量

 Example 1

 Example 2 

  •  Limitations of CSV files

CSV can only efficiently store one-dimensional and two-dimensional data,

np.savetxt() np.loadtxt() can only efficiently access one-dimensional and two-dimensional arrays.


access to multidimensional data

  • Access to data of any dimension

We have learned how to access one-dimensional and two-dimensional arrays. How to access data of any dimension?

First, for ndarray arrays in NumPy, we can use one of the methods in the array, as follows:

a.tofile(frame, sep='',  format='%s') 
• frame  :  文件、字符串 
• sep :  数据分割字符串,如果是空串,写入文件为二进制 
• format  :  写入数据的格式

Example 1

a.tofile(frame, sep=',',  format='%s')

Unlike CSV, this file does not contain any dimension information. Just list all the elements in the array one by one and output to this file 

 a.tofile(frame, sep='',  format='%d')

 As shown in the figure above, the file generated at this time is a binary format file. After we open it, we will find that we cannot understand this format with a text editor. In fact, binary files take up less space than text files. If you know the relationship between the encoding of the displayed characters and the bytes, you can understand this problem, so I won't go into details here. After writing the binary file, since we can't see the contents of the file, we can only use it as a way of data backup.

So how to restore this data from such a text file or binary file? We can use the functions provided by NumPy as follows:

np.fromfile(frame, dtype=float, count=‐1, sep='') 
• frame  :  文件、字符串 
• dtype :  读取的数据类型 
• count  :  读入元素个数,‐1表示读入整个文件 
• sep :  数据分割字符串,如果是空串,写入文件为二进制

Example 2

np.fromfile(frame, dtype=float, count=‐1, sep='')

 In this method, you can see that after the array is written to the file, the dimension information in it is lost, then we must know the dimension information when reading in, in order to effectively restore the information of the array.

  • requires attention

This method needs to know the dimension and element type when saving to the file when reading, a.tofile() and np.fromfile() need to be used together, and additional information can be stored through the metadata file

Convenient file access with NumPy

p.save(fname,  array)  或 np.savez(fname,  array) 
• fname :  文件名,以.npy为扩展名,压缩扩展名为.npz 
• array  :  数组变量 

np.load(fname) 
• fname :  文件名,以.npy为扩展名,压缩扩展名为.npz

  As you can see in the first line, it writes the language information of the array, including dimensions and data types, into the first line, so that when NumPy's load function reads this file, it can know by parsing the meta information of the first line. What is the shape of the data stored in this file and what type is used, so that it can be effectively restored and returned to an array. For programming, if we need to cache data through files, then using the load and save methods is an effective and fast method. If you want to store a file for data exchange and docking with other programs, then you should consider CSV files, or use the previous tofile method to generate a file that other programs can recognize. As for what method to use to store and extract files , this needs to be judged for different scenarios.


NumPy's random number function

  • Random function of np.random (1)

function illustrate
rand(d0,d1,..,dn) Create random number array according to d0-dn, float, [0,1), uniform distribution
randn(d0,d1,..,dn) Create an array of random numbers from d0-dn, standard normal distribution
randint(low[,high,shape]) Create random integer or integer array according to shape, the range is [low, high)
seed(s) random number seed, s is the given seed value

 Example 1

 

  • Random function of np.random (2)

function illustrate
shuffle(a) According to the first axis of the array a, the array is changed, and the array x is changed.
permutation(a) Generate a new out-of-order array based on the first axis of array a, without changing array x
choice(a[,size,replace,p]) Extract elements from the one-dimensional array a with probability p to form a new array of size shape replace indicates whether the elements can be reused, the default is False

 Example 2

 

  • Random function of np.random (3)

function illustrate
uniform(low,high,size) Generate an array with uniform distribution, low start value, high end value, size shape
normal(loc,scale,size) Generate an array with a normal distribution, loc mean, scale standard deviation, size shape
poisson(lam,size) Generate array with Poisson distribution, lam random event rate, size shape


NumPy's statistical functions

  • Statistical functions (1)

function illustrate
sum(a, axis=None) Calculates the sum of the relevant elements of the array a according to the given axis axis, axis integer or tuple
mean(a, axis=None) Calculates the expectation of the relevant elements of the array a based on the given axis axis, axis integer or tuple
average(a,axis=None,weights=None) Calculate the weighted average of the related elements of the array a according to the given axis axis
std(a, axis=None) Calculate the standard deviation of the relevant elements of the array a according to the given axis axis
var(a, axis=None) Calculate the variance of the related elements of the array a according to the given axis axis

axis=None is a standard parameter for statistical functions 

  Example 1

  • Statistical functions (2)

function illustrate
min(a) max(a) Calculate the minimum and maximum values ​​of elements in array a
argmin(a) argmax(a) Calculate the one-dimensional subscript of the minimum and maximum elements in the array a
unravel_index(index, shape) Convert one-dimensional subscript index to multi-dimensional subscript according to shape
ptp(a) Calculate the difference between the maximum value and the minimum value of the elements in the array a, which is to find the range difference
median(a) Calculate the median (median) of the elements in array a

  Example 2

Usually, we will use argmax in conjunction with unravel_index.


NumPy's gradient function

function illustrate
np.gradient(f) Calculate the gradient of the elements in the array f, when f is multi-dimensional, return the gradient of each dimension

Gradient: The rate of change between consecutive values, the slope

The Y-axis values ​​corresponding to three consecutive X-coordinates of the XY coordinate axis: a, b, c, where the gradient of b is: (c‐a)/2.

  Example 1

 


Summary of this unit

  • CSV file

        np.loadtxt()

        np.savetxt()

  • multidimensional data access

        a.tofile()

        np.fromfile()

        np.save()

        np.know()

        np.load()

  • random function

    np.random.rand()      np.random.randn()

    np.random.randint()   np.random.seed()

    np.random.shuffle()  np.random.choice()

    np.random.permutation()

  • NumPy's statistical functions

        np.sum()                     np.mean()

        np.average()               np.std()

        np.var()                       np.median()

        np.min()                      np.max()

        np.argmin()                 np.argmax()

        np.unravel_index()      np.ptp()

  • NumPy's gradient function

        np.gradient()

Guess you like

Origin blog.csdn.net/m0_62919535/article/details/126912146