Use Python crawler to crawl information about Fucai Shuangseqiu, if you don’t read it, you won’t know it

Two days ago, I saw an interesting question on the Internet: Are lottery predictions reliable? Why do so many people still believe in lottery predictions?

Leaving aside, are lottery predictions reliable? Lottery predictions also vary from person to person. There are many scams in the world. Some people even mistakenly believe that the accuracy of lottery predictions can be very high.

img

In terms of lottery prediction, there are also serious people who study "laws", and there are no more than three "schools": data school, graphic school, and formula school . There is another school that is not included: the school of anagrams, calligraphy and painting, which can be included in the ranks of snake spirit diseases.

Which school of forecasts is reliable and accurate? I don't understand, because I hardly buy lottery tickets (buying is for fun, entertainment), and I don't do research. But no matter which faction there must be data to study, today I am only responsible for helping you how to obtain all the data of the 3D lottery since its inception (winning numbers, number of winning notes, sales and reward ratio, etc.)

When crawling some simple (without anti-crawling mechanism) static web pages, the general strategy is: select the target (the so-called url link), observe the structure (link structure, web page structure), and conceive hands-on (choose which HTML downloader, parser, etc.). In the crawling process, three sharp tools are involved :

HTML downloader: download HTML web pages HTML parser: parse out valid data Data storage: store valid data in the form of files or databases

Today, we will use the requests library and the BeautifulSoup module to grab the information related to the winning lottery web page, Welfare Lottery 3D, and save it into an Excel table.

Before we start, take a look at the structure of your landing page:

img
img

It can be found that the URL of the target web page changes one at a time: the number behind list_x represents the number of the page.

img

Then, it is very simple to observe the structure of the web page. You can see that the source code corresponding to the lottery information of the first issue is a tr node. We can use the BeautifulSoup library to extract some information in it.

The overall idea is: to obtain all the information of Welfare 3D since its establishment 14 years ago (a total of 246 pages), you only need to request 246 times separately. After obtaining different pages, you can use the BeautifulSoup library to extract relevant information, and use the xlrd library to write the data into Excel. Then you can get all the information of Welfare 3D. The result is as follows:

img
img
The detailed code is as follows:

import requestsfrom bs4 import BeautifulSoupimport xlwtimport time#获取第一页的内容def get_one_page(url):headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'}response = requests.get(url,headers=headers)if response.status_code == 200:return response.textreturn None#解析第一页内容,数据结构化def parse_one_page(html):    soup = BeautifulSoup(html,'lxml')    i = 0    for item in soup.select('tr')[2:-1]:        yield{            'time':item.select('td')[i].text,            'issue':item.select('td')[i+1].text,            'digits':item.select('td em')[0].text,            'ten_digits':item.select('td em')[1].text,            'hundred_digits':item.select('td em')[2].text,            'single_selection':item.select('td')[i+3].text,            'group_selection_3':item.select('td')[i+4].text,            'group_selection_6':item.select('td')[i+5].text,            'sales':item.select('td')[i+6].text,            'return_rates':item.select('td')[i+7].text    }#将数据写入Excel表格中def write_to_excel():    f = xlwt.Workbook()                                 sheet1 = f.add_sheet('3D',cell_overwrite_ok=True)    row0 = ["开奖日期","期号","个位数","十位数","百位数","单数","组选3","组选6","销售额","返奖比例"]    #写入第一行    for j in range(0,len(row0)):        sheet1.write(0,j,row0[j])        #依次爬取每一页内容的每一期信息,并将其依次写入Excel    i=0    for k in range(1,247):        url = 'http://kaijiang.zhcw.com/zhcw/html/3d/list_%s.html' %(str(k))        html = get_one_page(url)        print('正在保存第%d页。'%k)        #写入每一期的信息        for item in parse_one_page(html):        sheet1.write(i+1,0,item['time'])        sheet1.write(i+1,1,item['issue'])        sheet1.write(i+1,2,item['digits'])        sheet1.write(i+1,3,item['ten_digits'])        sheet1.write(i+1,4,item['hundred_digits'])        sheet1.write(i+1,5,item['single_selection'])        sheet1.write(i+1,6,item['group_selection_3'])        sheet1.write(i+1,7,item['group_selection_6'])        sheet1.write(i+1,8,item['sales'])        sheet1.write(i+1,9,item['return_rates'])        i+=1    f.save('3D.xls')def main():    write_to_excel()if __name__ == '__main__':    main()

At this point, all the 3D information about Welfare Lottery in 2014 can be crawled. As for how to predict? How will the lottery trend in the next issue? I don’t know and I don’t know, it’s up to you whether you win the lottery next. Lottery players, I can only help you so far!

At the end, is the lottery prediction accurate? I don't talk too much about theoretical analysis, I just ask two questions:

**Proposition 1:**Taking Shuangseqiu as an example, what is the probability of winning the next two sets of Shuangseqiu numbers, 1,2,3,4,5,6,7 and 3,4,8,11,22,29,7? Who is higher and who is lower or is it all the same?

**Proposition 2:** The second problem is simpler. Suppose you have tossed a coin 9 times and it all came out heads. Now you are going to vote for the 10th time, what is the probability of being heads?

img
If you still ask me, are there any rules to follow in the lottery? In my opinion, the law of the lottery is that there is no law** (if you don’t believe it, you can analyze all the data from the past 14 years)**, even if there is, it cannot be calculated at the level of human calculation. The lottery is entertainment and a game of luck. Even if a person makes money on the lottery and is lucky, it does not mean that the method used can increase the winning rate of the lottery. Any profit-making behavior under the name of increasing the winning rate, even if the starting point is well-intentioned, will eventually go wrong.

The above-mentioned full version of the full set of Python learning materials has been uploaded to the official CSDN. If you need it, you can scan the QR code of the CSDN official certification below on WeChat to get it↓↓↓

/>

Guess you like

Origin blog.csdn.net/m0_59162248/article/details/131826427