When reading a .csv file more than four million lines of data using numpy throws the following exception:
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20
The following is my source code:
import numpy as np import matplotlib.pyplot as mp import sklearn.ensemble as se import sklearn.metrics as sm headers = None data = [] with open ('/home/tarena/桌面/i-80.csv','r') as f: for i,line in enumerate( f.readlines()): if i==0: headers=line.split(',')[2:] else: data.append(line.split(',')[2:]) headers = np.array(data) data = np.array(data) print(headers.shape) print(data.shape)
The following are the results:
Traceback (most recent call last): File "/home/tarena/桌面/read_forest.py", line 13, in <module> headers = np.array(data) numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20 Process finished with exit code 1
Although the error, but still we got the result.
Members bigwigs, is there a solution?