pandas batch csv file read - read the press row index

When used in conjunction with pandas and fit_generator, do not want to read all the data into memory, because in fact, can not read the data volume is too high. Here's how to read size by batch_size:

1, to prepare the data:

 1 a = pd.DataFrame(a)
 2 a = [
 3     [1, 1, 1, 1],
 4     [2, 2, 2, 2],
 5     [3, 3, 3, 3],
 6     [4, 4, 4, 4],
 7     [5, 5, 5, 5],
 8     [6, 6, 6, 6],
 9 ]
10 a = pd.DataFrame(a)
11 a.to_csv("../a.csv", index=False)

2, the original data is read:

. 1 pd.read_csv ( " ../a.csv " ) 
Output:
2 0 2. 3. 1 . 3 0. 1. 1. 1. 1 . 4 . 1 2 2 2 2 . 5 2. 3. 3. 3. 3 . 6 . 3. 4. 4. 4. 4 . 7 . 4. 5. 5. 5 . 5 . 8 . 5. 6. 6. 6. 6

3, read the first few lines:

1 pd.read_csv("../a.csv", nrows=2)
输出:
2 0 1 2 3 3 0 1 1 1 1 4 1 2 2 2 2

4, skip lines, or how many rows to skip before:

1 pd.read_csv("../a.csv", skiprows=1, nrows=2)
输出:
2 1 1.1 1.2 1.3 3 0 2 2 2 2 4 1 3 3 3 3 5 pd.read_csv("../a.csv", skiprows=lambda x: x % 2 != 0)
输出:
6 0 1 2 3 7 0 2 2 2 2 8 1 4 4 4 4 9 2 6 6 6 6

  By skiprows specify how many rows to skip nrows parameters take much forward can be achieved batch_size input size.

Guess you like

Origin www.cnblogs.com/dan-baishucaizi/p/12084175.html