Python self-study-class20-crawling Oriental Fortune.com stock data (crawler)

Two days ago, I learned regular expressions and basic crawler applications, so I combined some of the previous UI interface design knowledge to make a stock data query tool, but it does not have the data analysis function; I
just started using some bloggers provided URL ("http://quote.eastmoney.com/stocklist.html") to fetch the stock list

import re
import urllib
import urllib.request
def getpage(path):
    data=urllib.request.urlopen(path).read().decode('utf-8')
    return data
#find没有括号抓取全部,有括号抓取括号内,内容有括号转义字符\(  \)
def getcode(data):
    regex_str="<li><a href=\"list,([6|9]\d{5}).html\">"
    #regex_str="<li><a href=\"list,hk([0-9]\d{4}).html\">\D\d{5}\D(.*?)<"
    pat=re.compile(regex_str)  #预编译
    codelist=pat.findall(data)
    return codelist
path="http://quote.eastmoney.com/stocklist.html"
data=getpage(path)   #抓取网页源代码
print(data)     #打印网页全部信息
codelist=getcode(data)

But I couldn’t get it out. I changed the regular expression for a long time, but it didn’t work. Then I printed the source code of the webpage, searched hard, and then I didn’t find the number and name information of the specific stock (WTF???) The strange thing is that I can find relevant information with the developer tools on the webpage, which means that I did not capture this part of the information when I crawled the source code of the webpage (harm!) This problem is beyond my ability;
so I Find another way, find the stock list on the web stock community stocks (http://guba.eastmoney.com/remenba.aspx?type=1&tab=1).
Insert picture description here
Next, just look at the format of the source code in the developer tools and write it. The regular expression is
enough. The regular expression used to extract the stock code and name:

regex_str = "<li><a href=\"list,([0-9]\d{5}).html\">\D\d{6}\D(.*?)<"

Then I found out that there was a confusion when withdrawing Hong Kong stocks, Gan! Bad fate! Then I went to check the source code of the Hong Kong stocks, and saw: Oh, the Hong Kong stock code has only 5 digits, and hk is added in front, so I will write another to distinguish it:

regex_str = "<li><a href=\"list,hk([0-9]\d{4}).html\">\D\d{5}\D(.*?)<"

So you're done, the search function I wrote (SearchStock) is as follows :

import urllib
import urllib.request
import re

class Search:
    def __init__(self,slect):
        self.choice=slect
    def SearchAll(self):
        path = "http://guba.eastmoney.com/remenba.aspx?type=1&tab=" + self.choice
        if self.choice == '3':   #港股规则不同需要重新判断
            regex_str = "<li><a href=\"list,hk([0-9]\d{4}).html\">\D\d{5}\D(.*?)<"
        else:
            regex_str = "<li><a href=\"list,([0-9]\d{5}).html\">\D\d{6}\D(.*?)<"
        data = self.getpage(path)
        codelist = self.getcode(data, regex_str)
        if self.choice != '3':      #港股
            codelist[0:] = codelist[30:]
        return codelist
    def getpage(self,path):
        data = urllib.request.urlopen(path).read().decode('utf-8')
        return data

    def getcode(self,data, regex_str):
        pat = re.compile(regex_str)  # 预编译
        codelist = pat.findall(data)
        return codelist

Insert picture description here
The stock list is finished, it seems a bit simple, then get another stock data download, and climb down and save the historical data. Searching on the Internet, I found 163 magical websites, which can download the historical data of Oriental Wealth (WOW!!!)

#http://quotes.money.163.com/service/chddata.html?code=1300133&end=20210201&fields=TCLOSE;HIGH;LOW;TOPEN;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP
#code=1300133&end=20210201   1代表深市(0代表沪市) 300133代表股票代码  20210201代表截至日期

Just do it, download the stock using urlretrieve(url,path), save it as a .csv file:

import urllib.request
import urllib
url="http://quotes.money.163.com/service/chddata.html?code=1300133&end=20210201&fields=TCLOSE;HIGH;LOW;TOPEN;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP"
path="D:\\Python代码\\class20\\down\\300133.csv"
urllib.request.urlretrieve(url,path)  #根据url下载到路径下

Okay, the basic function is solved, so if the download path is customized, how to judge whether there is the same folder under the customized path, and how to create one if there is no one? At this time, I use Baidu programming, I Found a function of os:

    if not os.path.exists(path):
        os.makedirs(path)  #指定路径创建文件夹

Since the save path can be customized, the stocks you want to find should also be customized, so change the magical download address above to:

url = "http://quotes.money.163.com/service/chddata.html?code="+"0"+code[0]+"&end="+data+"&fields=TCLOSE;HIGH;LOW;TOPEN;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP"

Improve my download function again and it will come out (DownloadStock) :

import urllib
import urllib.request
import os
class Down:
    def __init__(self,code,data,path):
        self.code=code
        self.data=data
        self.path=path
    def downloadstock(self):
        # 判断文件夹是否存在
        self.path = self.path + "\\" + self.data
        if not os.path.exists(self.path):
            os.makedirs(self.path)  # 指定路径创建文件夹
        url = "http://quotes.money.163.com/service/chddata.html?code=" + self.code + "&end=" + self.data + "&fields=TCLOSE;HIGH;LOW;TOPEN;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP"
        datapath = self.path + "//" + self.code + ".csv"
        urllib.request.urlretrieve(url, datapath)  # 根据url下载到路径下

I looked at the time, ah, there is still a lot of learning time today, and it’s not time to play, ok, then I will add the UI interface I learned before, and by the way, I use the inheritance method to initialize the interface and simplify the code. , So the interface input function (inputview), inherits the base class function (BaseWindowShow), and the list display function (ListShow ) has been released one after another: the
interface input function (inputview):

#coding=gbk
import tkinter
from tkinter import ttk
import 爬取东方财富股票数据.SearchStock
import 爬取东方财富股票数据.DownloadStock
import 爬取东方财富股票数据.BaseWindowShow
import 爬取东方财富股票数据.ListShow
class InputView(爬取东方财富股票数据.BaseWindowShow.BaseWindowShow):
    def __init__(self):
        爬取东方财富股票数据.BaseWindowShow.BaseWindowShow.__init__(self)
        self.entry1 = tkinter.Entry(self.win)  #导入文本框,输入股票代码
        self.entry1.place(x=250,y=0)
        self.entry2 = tkinter.Entry(self.win)  #导入文本框,输入截至时间
        self.entry2.place(x=400,y=0)
        self.entry3 = tkinter.Entry(self.win)  #导入文本框,输入保存路径
        self.entry3.place(x=550,y=0)
        self.comdvalue = tkinter.StringVar()  # 窗体自带文本,新建一个值
        self.comboxdc = ttk.Combobox(self.win, textvariable=self.comdvalue,width=30)  # 初始化
        self.comboxdc["values"] = ("单个股票数据下载","下载多个","全部下载")
        self.comboxdc.current(0)  # 选择第一个
        self.comboxdc.bind("<<ComboboxSelected>>", self.go)  # 绑定事件与函数
        self.comboxdc.place(x=0,y=0)
        self.button2 = tkinter.Button(self.win,text = "下载",command = self.download)  #导入搜索键,command表示绑定search的行为
        self.button2.place(x=750,y=0)
        self.button1 = tkinter.Button(self.win,text = "股票一览表",command = self.search)  #导入搜索键,command表示绑定search的行为
        self.button1.place(x=300,y=50)
        self.comvalue = tkinter.StringVar()  # 窗体自带文本,新建一个值
        self.comboxlist = ttk.Combobox(self.win, textvariable=self.comvalue,width=30)  # 初始化
        self.comboxlist["values"] = ("沪市", "深市", "港股")
        self.comboxlist.current(0)  # 选择第一个
        self.comboxlist.bind("<<ComboboxSelected>>", self.go)  # 绑定事件与函数
        self.comboxlist.place(x=0,y=50)
        self.select="1"

    def go(self,*args):
        if(self.comboxlist.get()=='沪市'): #保存选中的值
            self.select='1'
        elif(self.comboxlist.get()=='深市'):
            self.select='2'
        elif(self.comboxlist.get()=='港股'):
            self.select='3'

    def search(self):
        data=爬取东方财富股票数据.SearchStock.Search(self.select)
        stockdata=data.SearchAll()
        inserstr=爬取东方财富股票数据.ListShow.Listshowdata()
        if(self.comboxlist.get()=='沪市'): #在股票网址中 sh表示沪市,sz表示深市,hk表示港股
            inser="sh"
        elif(self.comboxlist.get()=='深市'):
            inser="sz"
        elif(self.comboxlist.get()=='港股'):
            inser="hk"
        for data in stockdata:
            #按照规则拼接出股票网址
            inserstr.addata(inser+data[0]+'-'+data[1]+"网址:"+"http://quote.eastmoney.com/"+inser+data[0]+".html?code="+data[0])
    def download(self):
        if (self.comboxdc.get()=="单个股票数据下载"):
            #code 股票代码 data截至时间  path保存路径
            #这三个的赋值必须每次都重新赋值,不能放到if语句前简化代码
            code=self.entry1.get()
            print(type(code))
            data = self.entry2.get()
            path = self.entry3.get()
            data = 爬取东方财富股票数据.DownloadStock.Down(code, data, path)
            data.downloadstock()
        elif(self.comboxdc.get()=="下载多个"):
            codelist=self.entry1.get().split(" ")
            print(codelist)
            for code in codelist:
                print(type(code))
                data = self.entry2.get()
                path = self.entry3.get()
                data = 爬取东方财富股票数据.DownloadStock.Down(code, data, path)
                data.downloadstock()
        elif(self.comboxdc.get()=="全部下载"):
            data = 爬取东方财富股票数据.SearchStock.Search(self.select)
            stockdata = data.SearchAll()
            for datas in stockdata:
                code=datas[0]
                data = self.entry2.get()
                path = self.entry3.get()
                data = 爬取东方财富股票数据.DownloadStock.Down(code, data, path)
                data.downloadstock()


Inherit the base class function (BaseWindowShow):

import tkinter
class BaseWindowShow:
    def __init__(self):
        self.win=tkinter.Tk() #构造窗体
        self.win.geometry("800x800+300+0")   #搜索数据显示窗口
    def show(self):
        self.win.mainloop()

List display function (ListShow):

import 爬取东方财富股票数据.BaseWindowShow
import tkinter
class Listshowdata(爬取东方财富股票数据.BaseWindowShow.BaseWindowShow):
    def __init__(self):
        爬取东方财富股票数据.BaseWindowShow.BaseWindowShow.__init__(self)
        self.list=tkinter.Listbox(self.win,width=200)  #文本编辑器
        self.list.pack()
    def addata(self,inserstr):
        self.list.insert(tkinter.END,inserstr)

Finally, write a main function (Main) by the way :

import 爬取东方财富股票数据.inputview

start=爬取东方财富股票数据.inputview.InputView()
start.show()

Python Package:
Insert picture description here

Then look at the effect:
Insert picture description here
Insert picture description here
because I’m too lazy to add judgments, when entering the stock code, add 0 to the Shenzhen Stock Exchange and 1 to the Shanghai Stock Exchange, and the save path needs to be added with escape characters (this can’t be solved in the code if it’s too food), as shown in the figure below It can be seen that the folder has been successfully created and saved.
Insert picture description here
Let me have a look at downloading multiple files at the same time: Insert picture description here
Summary: It’s
almost time to get a meal, and Ganfanren, Ganfanhun, and Ganfan are all human beings! ! ! !
In fact, there are still many areas that can be improved. For example, analysis and drawing (line graphs, histograms) can be added. These are all worth learning. Next time, I have time to improve the analysis and drawing functions; and I originally wanted to capture a real-time The data shows that, but I found that it is not easy to capture each data during the trading hours. During the trading hours, the data is displayed without displaying "-". The data will only be displayed after the market closes. I can't think of a solution for the time being, and the knowledge reserve is insufficient;
this Crawling data is still very rough, and you still need to work harder. The harder you work, the luckier you are;

Guess you like

Origin blog.csdn.net/weixin_46837674/article/details/113563740