Leaving aside whether lottery prediction fly? Lottery prediction also points person to person, a lot of tricks on rivers and lakes, and some will even mistaken lottery prediction accuracy can be high, these manipulator approach, so I do not know the principles of the Lottery willing to go out and buy materials.
On lottery prediction, there are eight children to study "Laws". No more than three "factions": data to send, send graphics, formulas pie. There are not included in the school: puzzle calligraphy school, can be incorporated into the snake disease ranks.
What's tricky to predict which faction accurate? I do not know, because I almost do not buy lottery tickets (buy also play, recreation and entertainment), nor to study.
But no matter which faction they will have data available research. This article describes how to obtain all the 3D data since the lottery since its inception, including the winning numbers, winning note, sales and return Prize proportion.
Crawling web information
When crawling some simple, no anti-climb mechanism of static pages, the policy in general is: select the target (so-called URL link) to observe the structure (link structure, page structure), the idea of hands-on (choice of what HTML downloader, parser, etc.).
In the course of reptile, it will involve three kinds of weapon:
-
HTML Download: Download HTML pages;
-
HTML parser: parsing the valid data;
-
Data storage: an effective data file or stored in the form of a database.
Today, we will use the library and BeautifulSoup module requests to fetch page Lottery Fucai 3D-related information, and saves it to Excel.
Before you begin, take a look at the landing page analysis of the structure:
It can be found, URL http://kaijiang.zhcw.com/zhcw/html/3d/list_2.html landing page each time a change: back list_x figures, which represent the first few pages.
Then, observe the page structure. Is also very simple, you can see a lottery information of the corresponding source code is a tr node, we can use BeautifulSoup library to extract some information there is.
The whole idea is: To get all the information Fucai 3D founded (total 246) since 14 years, only 246 separate requests, so access to different pages later, re-use BeautifulSoup library to extract the relevant information, the library will use data xlrd written in Excel, you can obtain all the information Fucai 3D, results as shown below:
(A total of nearly 5000 data)
Details code is as follows:
import requests
from bs4 import BeautifulSoup
import xlwt
import time
#获取第一页的内容
def get_one_page(url):
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'
}
response = requests.get(url,headers=headers)
if response.status_code == 200:
return response.text
return None
#解析第一页内容,数据结构化
def parse_one_page(html):
soup = BeautifulSoup(html,'lxml')
i = 0
for item in soup.select('tr')[2:-1]:
yield{
'time':item.select('td')[i].text,
'issue':item.select('td')[i+1].text,
'digits':item.select('td em')[0].text,
'ten_digits':item.select('td em')[1].text,
'hundred_digits':item.select('td em')[2].text,
'single_selection':item.select('td')[i+3].text,
'group_selection_3':item.select('td')[i+4].text,
'group_selection_6':item.select('td')[i+5].text,
'sales':item.select('td')[i+6].text,
'return_rates':item.select('td')[i+7].text
}
#将数据写入Excel表格中
def write_to_excel():
f = xlwt.Workbook()
sheet1 = f.add_sheet('3D',cell_overwrite_ok=True)
row0 = ["开奖日期","期号","个位数","十位数","百位数","单数","组选3","组选6","销售额","返奖比例"]
#写入第一行
for j