Python series stock analysis - automatically obtain Standard & Poor's 500 stock list .p5

Welcome to Part 5 Python for Finance tutorial series. In this tutorial and the next few sections, we will examine how to provide a lot of pricing information to more companies, and how to process all this data to proceed.

First, we need a list of companies. I can give you a list, but actually one of many challenges to get a list of stocks you might just might encounter. In our case, we need a Python list of S & P 500 company.

Whether you are looking for Dow Jones, S & P 500 or the Russell 3000 index, released these companies are likely to post somewhere. You'll want to make sure that it is up to date, but it may not be the perfect format. In our example, we will get a list of Wikipedia: http: //en.wikipedia.org/wiki/List_of_S%26P_500_companies.

Wikipedia code / symbol tissue on a table. To solve this problem, we will use HTML parsing library, Beautiful Soup.

First, we start from some libraries:

import bs4 as bs
import pickle
import requests

 

bs4 is beautifulsoup, pickle so that we can easily save the list of companies, rather than each time knocking Wikipedia runtime (although remember to update this list!), we will use the requests to obtain the source code from Wikipedia's page .

We start function:

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})

 

First, we visit the Wikipedia page, and gives the response that contains our source code. We want to treat the source code, we hope to visit .text property, we use BeautifulSoup steering soup. If you are not familiar BeautifulSoup work for you, it basically converts source code into BeautifulSoup objects suddenly can be regarded as more typical Python object.

Once Wikipedia tries to deny access to Python. Currently, I write this article, the code works does not change the header file. If you find the original source code (resp.text) as return does not seem like you see on a home computer the same page, and add the following changes resp var code:

    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'}
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies',
                        headers=headers)

 

Once we have our soup, we can simply search for  wikitable sortable classes. I know the only reason specified in this table is because I first looked at the source code browser. There may be a period of time, you want to resolve a list of stocks in different sites, perhaps it is in a table, or it could be a list, it could be some of the div tag. It's just a very specific solutions. From here, we just traverse the table:

    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)

 

For each row, after the title line (which is why we want [1:] to start), we say that the stock market is "table data" (td), we seize it .text, we add this code to our list.

 

Now, if we can save this list, it would be great. We will do this using the pickle module to serialize Python objects it for us.

    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)

    return tickers

 

We hope to continue and save the content, so that we do not have many requests to Wikipedia every day. At any time, we can update this list, or we can program it once a month ...... and so on.

The complete code so far:

 
import bs4 as bs
import pickle
import requests

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
        
    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)
        
    return tickers

save_sp500_tickers()
 

 

Now we know the code, we are ready to extract all the information, which we will do in the next tutorial thing.

Guess you like

Origin www.cnblogs.com/medik/p/10989786.html