1. Data Analysis - Representation Unit 2: NumPy Data Storage

1. Data CSV file access

CSV file

CSV (Comma-Separated Value, comma-separated value)

CSV is a common file format used to store bulk data

np.savetxt(frame, array, fmt='%.18e', delimiter=None)

frame: file, string or generator, can be a .gz or .bz2 compressed file
array: the array stored in the file
fmt: the format of the written file, for example: %d %.2f %.18e
delimiter: delimiter string, the default is any space

np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)

frame: file, string or generator, can make .gz or .bz2 compressed files
dtype: data type, optional
delimiter: delimiter string, the default is any space
unpack: If True, read attributes will be written to different variables

Limitations of CSV files

CSV can only efficiently store 1D and 2D arrays

np.savetxt() np.loadtxt() can only effectively access one-dimensional and two-dimensional arrays

2. Access to multidimensional data

How to access any dimension data?

a.tofile(frame, sep=' ', format='%s')

frame: file, string
sep: data split string, if it is an empty string, write the file as binary
format: the format of the written file

np.fromfile(frame, dtype=float, count=-1, sep=' ')

frame: file, string
dtype: the data type to read
count: the number of elements to read, -1 means to read the entire file
sep: data split string, if it is an empty string, write the file as binary

After this method writes the array information to the file, the dimension information is lost, and the original dimension information must be known at the time of reading in to effectively restore the array information.

The above is a text file and the following is a binary file:

Note :

This method needs to know the dimension and element type of the array when saving to the file when reading

a.tofile() and np.fromfile() need to be used together

Additional information can be stored via metadata files

Convenient file access for NumPy

np.save(fname,array) 或 np.savez(fname,array)

fname: file name, with the extension of .npy, and the compressed extension of .npz
array: array variable

np.load(fname)

fname: file name, with the extension of .npy, and the compressed extension of .npz

3. Random functions in NumPy

NumPy's sublibrary for random functions

NumPy's random sublibrary: np.random.*

**The random number function of np.random (1)**
function	illustrate
rand(d0,d1,...,dn)	Create an array of random numbers according to d0-dn, floating point numbers, [0,1), uniform distribution
randn(d0,d1,...,dn)	Create an array of random numbers according to d0-dn, standard normal distribution
randint(low[,high,shape])	Create a random integer or integer array according to shape, the range is [low, high)
seed(s)	Random number seed, s is the given seed value

By setting and reusing the same random number seed, we can get the same generated random number array during testing.

**The random number function of np.random (2)**
function	illustrate
shuffle(a)	Randomly arrange according to the 0th axis (outermost layer) of array a, change array a
permutation(a)	Generate a new random array according to the 0th axis of array a, without changing array a
choice(a[,size,replace,p])	Extract elements from one-dimensional array a with probability p to form a new array of size shape, replace indicates whether elements can be reused, the default is True. p defaults to the same probability

**The random number function of np.random (3)**
function	illustrate
uniform(low,high,size)	Produces an array with uniform distribution, low start value, high end value, size shape
normal(loc,scale,size)	Generate an array with normal distribution, loc mean, scale standard deviation, size shape
poisson(lam,size)	Generates an array with Poisson distribution, lam random event rate, size shape

4. Statistical functions of NumPy

Statistical functions directly provided by NumPy: np.*

**Statistical functions of np.random (1)**
function	illustrate
sum(a, axis=None)	Computes the sum of the relevant elements of the array a according to the given axis, an integer or tuple of axis
mean(a, axis=None)	Computes the expectation of the relative element of array a according to the given axis, axis integer or tuple
average(a, axis=None, weights=None)	Computes the weighted average of the relevant elements of array a according to the given axis axis
std(a, axis=None)	Computes the standard deviation of the relative elements of the array a according to the given axis axis
var(a, axis=None)	Computes the variance of the relative elements of array a according to the given axis axis

**Statistical functions of np.random (2)**
function	illustrate
min(a) max(a)	Calculate the minimum and maximum values of the elements in the array a
argmin(a) argmax(a)	Calculate the subscript of the minimum value and maximum value of the elements in the array a after one-dimensional reduction
unravel_index(index, shape)	Convert one-dimensional subscript index to multidimensional subscript according to shape
ptp(a)	Calculate the difference between the maximum value and the minimum value of the elements in the array a
median(a)	Calculates the median (median value) of the elements in array a