Stock Crawler Project Summary

By crawling the stock data on Yahoo Finance and drawing a K-line chart of stock prices, the analysis of the timing of buying and selling stocks is realized. During this process, several new problems were discovered, which are documented as follows:

1. Date and time processing module in python (date and datatime)

The time module is mainly used for time access and conversion. This module provides various time-related functions. The time stamp format and the conversion between string formats are shown in the following figure: struct_time needs to be used as a transfer station, and it cannot be done directly in one step.

 2. Storage of crawled data

(1) The data obtained through request.get(url) is requests.models.Response type. If we want to view it, we must convert it to .text format to view it. Save the data to .csv as follows:

r=requests.get(url)
f = open(name+ ' .csv ' , ' w ' ) #Create a csv file for storing stock data 
f.write(r.text) #Write the obtained stock data in the form of text data into the csv file 
f.close( )
#Read the .csv file into dataFrame format, and modify the index 
data = pd.read_csv(name+ ' .csv ' , index_col=0, parse_dates=True, sep= " , " , dayfirst= True)
 #Modify the index and column The name to fit into this article's analysis 
data.index.rename( ' date ' , inplace= True)
data.rename(columns={'Open':'open', 'High':'high', 'Low':'low', 'Close':'close','Volume':'volume'}, inplace=True)

 

(2) If the returned data is of dataFrame type, you can directly use dataFrame.to_csv('xx.csv') to store it as a .csv file. Or use dataFrame.to_sql('xxx.sql') to save to the sql database:

s = pdr.get_data_yahoo(name, begin, end)#The data obtained is directly in dataFrame format
s.to_csv(name+'.csv')
data = pd.read_csv(name+ ' .csv ' , index_col=0,parse_dates=True, sep= " , " , dayfirst= True)
 #Modify the names of indexes and columns to suit the analysis of this article 
data.index.rename( ' date ' , inplace= True)
data.rename(columns={'Open':'open', 'High':'high', 'Low':'low', 'Close':'close','Volume':'volume'}, inplace=True)

(3) How crawlers fight against the anti-crawling mechanism of the website: simulate user login to the website through user-agent.

def get_one_page(url):
    req = urllib.request.Request(url)
     #Simulate browser post request information 
    req.add_header( ' User-Agent ' , ' Mozilla/5.0(Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 ' )
     try :
        response=urllib.request.urlopen(req)
        response=response.read().decode('utf-8')
        return response
    except RequestException:
        return None

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325343225&siteId=291194637