Data cleaning: remove rows with vacant values in txt files

Recently I was doing reptile crawling on Douban reading data, and found that many books will not be scored, and the reptiles did not do special treatment at that time, so this problem was retained.
The data format is separated by commas, (book_id, book_name, book_score)
part of the data is as follows

1443021,网络营销, 
2265243,How Buildings Work, 
4022720,影子富豪查克·菲尼, 7.3 
2157526,Mind Set!, 
1431351,平家物语图典, 7.1 

I have tried to judge whether it is None before, or use regular matching Null will not work. The
last idea is to convert the score into a string, because the score at this time is an element of the list, and then only the string score and "" The
code is as follows:

#处理缺失值的程序   auther:wuyou
file = open("BookInfo.txt","r",encoding="utf-8")  #打开老文件,读模式
newfile = open("Book.txt","a",encoding="utf-8")   #打开新文件,追加写模式
for line in file:
    info = line.replace("\n","")    #去掉换行符
    book_info = info.split(",")     #划分数组
    if len(book_info) > 3:          #如果多于3个元素
        continue
    score = str(book_info[2])       #转换成字符串
    if score == " ":                #如果字符串为空
        continue
    newfile.write(line)             #如果通过前两项测试,则写入新的文件中保存
file.close()        #关闭文件
newfile.close()     #关闭文件
Published 304 original articles · 51 praises · 140,000 views

Guess you like

Origin blog.csdn.net/qq_39905917/article/details/104857763
Recommended