[Data Analysis] m rows of data selected at random from a CSV file

CSV file a total m (4017277) pieces of data, taken randomly n (10 million) of data, a stored value further CSV file.

Note: The data type is DataFrame

Import Random
 from Random Import the randint 
 
oldf = Open ( ' thp_zbwd_bing_01_Del_abs50.csv ' , ' R & lt ' , encoding = ' UTF-. 8 ' ) 
Newf = Open ( ' thp_zbwd_bing_01_Del_abs50_Random.csv ' , ' W ' , encoding = ' UTF-. 8 ' ) 
n- = 0
 # Sample (x, y) function returns from the sequence x, y will not repeat randomly selected elements 
resultList = random.sample (Range (0,4017277), 100000 ) 

Lines =oldf.readlines()
for i in resultList:
    newf.write(lines[i])
    
oldf.close()
newf.close()

 

Guess you like

Origin www.cnblogs.com/ITCSJ/p/11411149.html