When used in conjunction with pandas and fit_generator, do not want to read all the data into memory, because in fact, can not read the data volume is too high. Here's how to read size by batch_size:
1, to prepare the data:
1 a = pd.DataFrame(a) 2 a = [ 3 [1, 1, 1, 1], 4 [2, 2, 2, 2], 5 [3, 3, 3, 3], 6 [4, 4, 4, 4], 7 [5, 5, 5, 5], 8 [6, 6, 6, 6], 9 ] 10 a = pd.DataFrame(a) 11 a.to_csv("../a.csv", index=False)
2, the original data is read:
. 1 pd.read_csv ( " ../a.csv " )
Output: 2 0 2. 3. 1 . 3 0. 1. 1. 1. 1 . 4 . 1 2 2 2 2 . 5 2. 3. 3. 3. 3 . 6 . 3. 4. 4. 4. 4 . 7 . 4. 5. 5. 5 . 5 . 8 . 5. 6. 6. 6. 6
3, read the first few lines:
1 pd.read_csv("../a.csv", nrows=2)
输出: 2 0 1 2 3 3 0 1 1 1 1 4 1 2 2 2 2
4, skip lines, or how many rows to skip before:
1 pd.read_csv("../a.csv", skiprows=1, nrows=2)
输出: 2 1 1.1 1.2 1.3 3 0 2 2 2 2 4 1 3 3 3 3 5 pd.read_csv("../a.csv", skiprows=lambda x: x % 2 != 0)
输出: 6 0 1 2 3 7 0 2 2 2 2 8 1 4 4 4 4 9 2 6 6 6 6
By skiprows specify how many rows to skip nrows parameters take much forward can be achieved batch_size input size.