Python Stock Analysis Series - Get all S & P 500 company shares data .p6

Welcome to Part 6 Python for Finance tutorial series. Before the Python tutorial, we describe how to obtain the list of companies we are interested in (the S & P 500 index in our case), we will now collect all of the company's stock price data.

The code so far:

 
import bs4 as bs
import pickle
import requests

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
        
    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)
        
    return tickers
 

 

We will add some new import:

import datetime as dt
import os
import pandas as pd
import pandas_datareader.data as web

 

 

We will use the datetime specified date Pandas datareader, os checks and create directories. You already know what the pandas!

Start our new features:

 
def get_data_from_yahoo(reload_sp500=False):
    
    if reload_sp500:
        tickers = save_sp500_tickers()
    else:
        with open("sp500tickers.pickle","rb") as f:
            tickers = pickle.load(f)
 

 

Here, I'll show you a quick example of a method of treatment can be reloaded whether the S & P 500 list. If we ask this question, the program will kick-start the S & P 500 index, otherwise it will only use our pickle. Now we're ready to crawl data.

Now we need to decide how we will handle the data. I tend to parse only one site, and the data stored locally. I do not know in advance all the things I could do with the data, but I know if I pull it more than once, I wish to save it (unless it is a huge data sets, is not). Therefore, we will all be returned to Yahoo from each of us are out of stock, and preserved. To this end, we will create a new directory and store each company's inventory data in there. First, we need this initial catalog:

    if not os.path.exists('stock_dfs'):
        os.makedirs('stock_dfs')

 

You can use these data sets stored in your script in the same directory, but in my opinion it will be very troublesome. Now we are ready to extract the data. You already know how to do it, we did it in the first tutorial!

 
    start = dt.datetime(2000, 1, 1)
    end = dt.datetime(2016, 12, 31)
    
    for ticker in tickers:
        if not os.path.exists('stock_dfs/{}.csv'.format(ticker)):
            df = web.DataReader(ticker, "yahoo", start, end)
            df.to_csv('stock_dfs/{}.csv'.format(ticker))
        else:
            print('Already have {}'.format(ticker))
 

 

You might want to do some force_data_update parameters for this function, because now it will not re-extract the data it has seen hit. Since we are to extract data daily, so you need to extract at least the latest data. That is, if this is the case, then it is best to use a database table instead of each company, and then extract new value from the Yahoo database. Nevertheless, we keep it simple!

The complete code so far:

 
import bs4 as bs
import datetime as dt
import os
import pandas as pd
import pandas_datareader.data as web
import pickle
import requests


def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
        
    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)
        
    return tickers

#save_sp500_tickers()


def get_data_from_yahoo(reload_sp500=False):
    
    if reload_sp500:
        tickers = save_sp500_tickers()
    else:
        with open("sp500tickers.pickle","rb") as f:
            tickers = pickle.load(f)
    
    if not os.path.exists('stock_dfs'):
        os.makedirs('stock_dfs')

    start = dt.datetime(2000, 1, 1)
    end = dt.datetime(2016, 12, 31)
    
    for ticker in tickers:
        # just in case your connection breaks, we'd like to save our progress!
        if not os.path.exists('stock_dfs/{}.csv'.format(ticker)):
            df = web.DataReader(ticker, "yahoo", start, end)
            df.to_csv('stock_dfs/{}.csv'.format(ticker))
        else:
            print('Already have {}'.format(ticker))

get_data_from_yahoo()
 

 

In the process of running it in. If Yahoo does not give you continuous crawl, you may need to import time and add a time.sleep (0.5). At the time of this writing, Yahoo did not completely stifle me, and I was able to complete the whole process without any problems. However, this may take some time, depending on your particular machine. But the good news is that we do not need to do it again! In practice, once again, because this is the daily data, however, you probably do every day.

In addition, if your Internet speed is very slow, you do not need to do everything, even if only 10 times is enough, so you can use for ticker in ticker: to speed up or something like that [10].

In the next tutorial, once you've downloaded the data, we are interested we will be compiled into a data pandas DataFrame.

Guess you like

Origin www.cnblogs.com/medik/p/10989787.html