Data analysis --- numpy basics (3)

In the last two articles, we introduced some basic usage of numpy function and the usage of its extension function. Here is an introduction to the numpy library to read and write files.

One, use numpy to read files

1. numpy to store, store and read csv files

    CSV (with a comma as a separator) is a common file format used to store batch data

storage:

# 文件存储
np.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', 
          header='', footer='', comments='# ', encoding=None)
  • fname:  file, string, can be a compressed file of .gz or .bz2

  • X:   Array stored in the file

  • fmt:   The format of the written file, for example: %d %.2f %.18e

  • delimiter:  the string to split the column, the default is any space

  • newline: the string that  splits the line

  • header:  file header

Read:

# 文件读取
np.loadtxt(fname,  delimiter=None, skiprows=0,
           usecols=None)
  • fname:   the name of the file to be read

  • delimiter:  the string to split the column, the default is any space

  • skiprows:  skip the first row, the default is 0, usually skip the file header

  • usecols:  the desired column

Example 1. Storage:

# 存储
import numpy as np
a = np.arange(50).reshape(5, 10)
# 保存为.txt文件
file = np.savetxt('./test/a.csv', a, fmt = '%d',delimiter=',')
​

The saved files are as follows:

Example 2, read:

# 文件读取
np_file = np.loadtxt('./test/a.csv', delimiter=',')
print(np_file)
# 只取第一列和第五列数据
np_file1 = np.loadtxt('./test/a.csv',usecols=(0, 4), delimiter=',')
print(np_file1)
​
"""
np_file: [[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
           [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
           [20. 21. 22. 23. 24. 25. 26. 27. 28. 29.]
           [30. 31. 32. 33. 34. 35. 36. 37. 38. 39.]
           [40. 41. 42. 43. 44. 45. 46. 47. 48. 49.]]
第1列和第五列数据 [[ 0.  4.]
                 [10. 14.]
                 [20. 24.]
                 [30. 34.]
                 [40. 44.]]
"""

Note:  csv can only effectively store one-dimensional and two-dimensional arrays, and np.savetxt() and np.loadtxt() can only effectively store one-dimensional and two-dimensional arrays.

2. numpy performs multi-dimensional data access:

storage:

a.tofile(fid, sep="", format="%s")
  • fid:  file, string

  • sep:  data segmentation string, if it is an empty string, write to the file as binary

  • format: the format of the  written data

Read:


​fromfile(file, dtype=float, count=-1, sep='')
  • file:  file, string

  • dtype:  the type of data read

  • count: the number of elements read, -1 means read the entire file

  • sep: data segmentation string, if it is an empty string, write the file as binary

storage:

# 多维数组的存储
b = np.arange(50).reshape(5, 5, 2)
b.tofile("./test/b.bat", sep=",", format="%d")

Read:

# 多维数组的读取
np.fromfile('./test/b.bat', dtype=np.int, sep=',')
"""
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
"""
np.fromfile('./test/b.bat', dtype=np.int, sep=',').reshape(5, 5,2)
"""
array([[[ 0,  1], [ 2,  3], [ 4,  5], [ 6,  7], [ 8,  9]],
        [[10, 11], [12, 13], [14, 15], [16, 17], [18, 19]],
        [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29]],
        [[30, 31], [32, 33], [34, 35], [36, 37], [38, 39]],
        [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49]]])
"""

Note : This method needs to know the dimension and element type of the array when it is stored in the file when reading, and b.tofile() and np.fromfile() need to be used together to store additional information through the metadata file.

3. Convenient file access in numpy

np.save(file, arr)   np.savez(file, arr)
  • file: file name, with .npy as the extension, and the compressed extension is .npz

  • arr: array variable

    load() automatically recognizes npz files and returns an object similar to a dictionary. The contents of the array can be obtained by using the array name as the key.

np.load(file)
  • file: file name, with .npy as the extension, and the compressed extension is .npz
a = np.arange(50).reshape(5,5,2)
np.save("a.npy", a)
b = np.load('a.npy')
print(b)

To store data in this way, it is convenient to save the training set, validation set, test set, and their labels in deep learning. When stored in this way, what you need to load and the number of files is greatly reduced. The file name will be changed everywhere. It is a better way to store data.

 

Wonderful recommendation

Python image recognition-image similarity calculation

Install GPU version of TensorFlow (cuda + cudnn) under win10

TensorFlow-GPU linear regression visualization code, and summary of the problem

Classification of all crawler articles

Selenium-based automated sliding verification code cracking

Crawl 58job, Ganji job and Zhaopin recruitment, and use data analysis to generate echarts graph

Guess you like

Origin blog.csdn.net/weixin_39121325/article/details/99715464