Python checks whether multiple URLs can work properly

After a delay of 2 months, the personal project finally started, and I am constantly learning and growing. I got a copy of OpenSources (you can find it by searching directly on github), and interested friends can also download the csv file for practice.
import pandas as pd
import urllib.request
import time
#这里读取文件所在的位置
data =pd.read_csv('/Users/Macbook/Documents/GitHub/opensources/sources/sources.csv')
data[:30]#看一下前30行的数据是什么样的
#把pd中的一列写入fakenews.txt中
data['Unnamed: 0'].to_csv('fakenews.txt', sep='\t', index=False)


opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/49.0.2')]
file = open('fakenews.txt')
lines = file.readlines()
checklist=[]
for line in lines:
    #这里注意一下:适用于爬虫和检测url有效性
    htp='http://'
    line = htp + line
    temp=line.replace('\n','')
    checklist.append(temp)
print(checklist)

print('测试:')
for a in checklist:
    tempUrl = a
    try :
        opener.open(tempUrl)
        print(tempUrl+'没问题')
        time.sleep(2)
    except urllib.error.URLError:
        print(tempUrl+'访问页面出错')
        time.sleep(2)
    # try except语句可以很好地帮助我们跳过错误,继续运行代码
    except Exception as e:
        print(a,e)
    time.sleep(1)
    #time模块能够避免被当作机器

Anyway Practice makes perfect, if you
have any questions, you can communicate with each other. I would also like to thank @程序姨全敏 for the thought support provided. This experience post has modified and supplemented the frequently reported errors~

Guess you like

Origin blog.csdn.net/weixin_42294077/article/details/109385175