How to scrape stock data

Today, I sorted out the code for data extraction, and made a console program. This program mainly captures Sina.com's industry data, stock information, daily, weekly, and monthly stock price data, and stores the data in the database . The code download URL is provided at the end of the article. The code can be run. After setting the database connection string, the data can be extracted.

 

Data extraction is mainly to find suitable data sources, analyze and extract data.

1. Data source

The free interfaces are mainly major portal websites and financial websites, such as the financial channel of Sina NetEase, or the websites of Oriental Fortune and Hexun.

Aggregated data stock interface dedicated to interface, etc.

In addition, if you want to know the constituent stocks of various indexes, such as the constituent stocks of the Shanghai Stock Exchange 50, Shenzhen 300 and other indices, you can refer to  the Shanghai Stock Exchange and  Shenzhen Stock Exchange . These two websites provide all the stock data of the two cities and provide Excel Download, and then import it into the data after downloading.

2. Analyze and extract data

View the network request data through F12 or Fiddler of the browser. The data is best returned in json format. The data in json format is very easy to extract. Using Newtonsoft.Json to convert json objects into dynamic objects, it is very convenient to access json data. Before using dynamic objects, we always created a class consistent with the json object, and then deserialized it. Now we don't need it. The code example is as follows:

 

public IList<DataAccess.Stock> GetStocks(DataAccess.StockCategory category)  
        {  
            var url = "http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodeData?page=1&num=900&sort=symbol&asc=1&node={0}&symbol=&_s_r_a=init";  
            url = string.Format(url, category.code.Trim('"'));  
            string content = getRequestContent(url);  
            dynamic stocks = Newtonsoft.Json.JsonConvert.DeserializeObject(content);  
  
            IList<DataAccess.Stock> list = new List<DataAccess.Stock>();  
            foreach (var stock in stocks)  
            {  
                list.Add(new DataAccess.Stock  
                {  
                    code = stock.code,  
                    symbol = stock.symbol,  
                    name = stock.name  
                });  
            }  
            return list;  
        }  

 If the returned data is not in json format, it needs to be extracted with regular expressions. In this regard, it depends on the situation. I use regular expressions in part of my code.

In addition, in order to prevent the website interface from changing, you need to create an interface class to implement several more interfaces for the application to deal with subsequent website interface changes.

 You can see the original address for the source code, follow the WeChat public account to get it

Reprinted: http://www.cnblogs.com/hongyin163/p/stockdata.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326536745&siteId=291194637