python data analysis - linear regression to select funds

1 Introduction

In the previous chapters, we have been using python crawler to grab data, and then store the data information in the database. So far, the basic basic information processing has been completed, and then we will deal with the advanced content. Today Let's start with the fund's trend analysis.

2 Fund Trend Analysis

The trend of funds is to choose some funds with strong performance. What kind of funds are strong? That is to be stable and gradually go north all the way. Under normal circumstances, the fund will follow a trend line up or down, and the trend of the fund is more certain than the trend of the stock. The following picture is an example, showing the trend of the Huaxia CSI New Energy Vehicle ETF. It can be seen that the trend of this fund is basically in accordance with the red trend line. All we have to do today is to use a mathematical-linear regression approach to calculate the slope of this trend and the reliability of the trend table.

The model for analyzing the fund trend here adopts linear regression, assuming that its trend conforms to y=kx+b{ y = kx + b }y=kx+b , y is the corresponding rate of return, and x is time. The k value is the slope. All we have to do now is to use the data for this set of funds to calculate this k value, so that we can use this k value to compare funds.

3 Data capture and analysis

3.1 Fund data capture

Capture the data of historical returns of fund data

# 抓取基金历史收益率数据连接
http://api.fund.eastmoney.com/pinzhong/LJSYLZS?fundCode=515030&indexcode=000300&type=y
# 参数说明
fundCode 为需要查询的基金代码
indexcode 基金对比基准数据,默认为沪深300(000300)
type 为数据查询的周期,m 一个月 q 3个月 hy 6个月 y 一年 try 3年 fiy 5年 sy 今年来 se 最大

In the data returned by the API interface, 0 represents the fund data, 1 is the average value of similar funds, and 2 is the CSI 300 data.

The specific implementation code is shown in the figure:

3.2 Data Analysis

The way of data analysis uses matplotliband sklearn.linear_model, the first is for the graphical display of the data, and the second is the linear analysis tool, which is used to calculate the k-value of the fund. For the content of linear analysis, those who are interested can check the calculation details of linear analysis.

As shown in the figure below, the code for the data model calculation and graphical display.

Taking the new energy ETF data as an example, we get a trend line of y= 0.3541x + b, and the score of this linear model is 0.741. In fact, this score is already quite high. The greater the yield, the greater the volatility, and the lower the fit with linear programming.

But are there any exceptions? Taking Tianhong Zengli Short-Term Bond C (008647) as an example, its score is quite high, as you can see from the graphic display, but the k-value ratio of bond funds is quite lower than that of stock funds. High risk, high reward, low risk, low reward. Profit is compensation for risk.

4 Summary

In this chapter, the use of linear programming is introduced to analyze the trend of funds, and the method of quantitative analysis is used to analyze and screen funds. Finally, you can use this method to analyze all the funds, and screen out the funds with strong trends for investment.

At the end, I will also give you a python spree [Jiajun Yang: 419693945] to help you learn better!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324341493&siteId=291194637