Big data processing problems encountered with numpy

When reading a .csv file more than four million lines of data using numpy throws the following exception:

numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20

The following is my source code:

import numpy as np
import matplotlib.pyplot as mp
import sklearn.ensemble as se
import sklearn.metrics as sm
headers = None
data = []
with open ('/home/tarena/桌面/i-80.csv','r') as f:
    for i,line in enumerate( f.readlines()):
        if i==0:
            headers=line.split(',')[2:]
        else:
            data.append(line.split(',')[2:])
headers = np.array(data)
data = np.array(data)
print(headers.shape)
print(data.shape)

The following are the results:

Traceback (most recent call last):
  File "/home/tarena/桌面/read_forest.py", line 13, in <module>
    headers = np.array(data)
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20

Process finished with exit code 1

Although the error, but still we got the result.

Members bigwigs, is there a solution?

Guess you like

Origin www.cnblogs.com/bitrees/p/11369327.html