First experience of python financial quantification——Summary of Question A of the 2020 "Great Bay Area Cup" Financial Mathematical Modeling Competition "Great Bay Area Index Enhancement Strategy" (continuously updated)

Table of contents

1 Introduction

2. Preparations

(1) Topic analysis:

provided dataset

question

3. Problem-solving ideas:

(1) Ideas for solving the first question

4. Code implementation

(1) Part of the code for the first question


1 Introduction

        I finished learning the relevant framework of data mining with my boyfriend before, and decided to find some topics for actual combat. I discovered this competition by accident, so I found the competition topics of the first competition on the official website. I originally wanted to do question B, but question B is an open data set question. In order to better practice how to deal with random original data sets, I decided to use question A as a practice question after discussion.

        Since this is the first time I have come into contact with financial data, it is also the first actual combat since I learned data mining, so if there are any mistakes, please enlighten me and include more. So let's get started!

2. Preparations

The title compression package can be downloaded on my Baidu network disk:

Link: https://pan.baidu.com/s/1AN2NP1_w_i72bC_vtNU2qw?pwd=bxx3
Extraction code: bxx3

(1) Topic analysis:

Provided datasets:

The topic provides us with two data files, which are the market data of 30 stocks from 2011 to 2020, and the market data of the Greater Bay Area Index from 2011 to 2020 in the Greater Bay Area.

It can be seen that the data provides the stock code (code), the opening price of the day (open), the highest trading price of the day (high), the lowest trading price of the day (low), the closing price of the day (close), the trading volume of the day (volume), and the Transaction amount (amount) data.

(The data of 30 stocks is the same as the data type given by the market in the Greater Bay Area)

question:

 

Problem analysis:

Summary: A company in the title gives an investment strategy, that is, select 10 strong stocks out of 30 stocks for investment every week, and change positions every Monday (that is, sell all stocks and buy the strong stocks of the previous week ).

Requirements: Take Monday's closing price (close) as the quasi-considered income, the investment amount of each stock is fixed at 10% of the principal, and the handling fee is calculated at 2.5 ten thousandths.

The first sub-question: Calculate the yield curve of this investment strategy and compare it with the Greater Bay Area Index

The second sub-question: Adjust the time of position adjustment, adjust the investment amount of a single stock, find the optimal investment strategy and draw the yield curve

The third sub-question: The rebalancing time remains unchanged, and the investment in a single stock is rated at 10% of the principal. Based on market data modeling, draw your own investment strategy.

The fourth sub-question: The adjustment time and investment amount can be changed flexibly, based on market data modeling, to obtain your own investment strategy. (3 working days to complete the opening of the position, the position is not less than 50%, and a single stock does not exceed 10% of the total assets)

3. Problem-solving ideas:

I first analyzed the first and second questions, and after discussion, I came up with a general solution to the problem (PS: Discussions within the playing team are really important)

(1) Ideas for solving the first question:

First of all, we need to figure out a financial concept - rate of return.

After looking up the information, we know that the rate of return formula is:

Rate of return = income / original investment

Due to the requirements of the title, the closing price (close) on Monday is taken as the quasi-considered income, the investment amount of each stock is fixed at 10% of the principal, and the handling fee is calculated at 2.5 ten thousandths. So we temporarily abstract the rate of return formula into the following code:

price = open -  open * (0.000025) #单支投资额
interest = ((close-open)/price ) * 100 # 利益率

At this time, my teammate suddenly asked a question - how to calculate the principal . After discussion, we decided to set the principal at 100,000 .

Why can't we directly set the total purchase price of 10 single-share stocks when opening a position as the principal? Because the observation found that the opening prices of these 30 stocks range from 25 yuan to 110 yuan, if the total purchase price of 10 single-share stocks is calculated, if the prices of the ten stocks are all 20-30 yuan when the position is opened, then the price of the 10 stocks in the second week When you need to buy another 10 stocks, you may not have enough principal to buy them. In order to avoid this situation, we simply set the principal at 100,000, and then set each stock investment at 10% of the principal according to the requirements (that is, the original investment amount of each stock is 10,000).

In this way, the principal problem is solved. The final interest rate abstract formula is:

'''
power是单支股票的总收益额
price是购买单支股票的总入手价格
Interest是收益率
stocnum是本金能够购买的单支股票股数
principal是本金
'''
principal=100000
stocnum = int((principal / 10) / opendata[i])
power = (opendata1[i] - opendata[i]) * stocnum
price = (stocnum * opendata[i]) - (stocnum * opendata[i]) * (0.000025)
Interest = (sum(powerdata) / sum(pricedata)) * 100

After solving the calculation method of the rate of return, the subsequent discussions are basically very smooth.

The idea is as follows:
(1) Opening a position within 3 working days: By calculating the closing price of the third day minus the opening price of the first day as the income, select the 10 stocks with the highest income for opening a position

(2) Strip the data of 30 stocks and save them separately: Read the market data of 30 stocks and slice them, separate and save the 30 stocks from the original csv file for later reading

(3) Intercept the closing price and opening price data of each Monday: Since we need to use the closing price and opening price of each Monday to calculate the weekly yield, and thus determine the 10 strong stocks that need to be invested in the next week , so the stock data can be sliced ​​on a weekly basis, and then the weekly data in the data can be intercepted by the iloc method, including Monday's close and open, and the Monday data of 30 stocks per week can be taken out.

(4) Calculate the weekly strong stocks: create a function, call the function in (3), get the closing price and opening price data of 30 stocks in the current week, and use it to calculate the return rate of each stock in the current week, select the return The 10 stocks with the highest rate are regarded as strong stocks, as the investment stocks for the next week's data, and the return value is the weekly strong stock code. Loop this function in the main function to get weekly strong stock data.

(5) Calculate the interest rate of weekly investment: create a function, add the data in (3) and (4), and perform key-value matching between the stock codes of 10 strong stocks and the Monday data of 30 stocks every week , to get the weekly data of 10 stocks on Monday, which is used to calculate the interest rate of the weekly 10 stock data.

(6) Draw a line chart with the interest rate array: where the abscissa is the year (2011-2020)

4. Code implementation

(1) Part of the code for the first question

Separate 30 stock data and save:

import pandas as pd

# 读取数据并进行初步处理
def read_csv_shuju(path):
    '''

    :param path:
    path:文件路径
    :return:
    data2: 处理后数据
    '''
    data1 = pd.read_csv(path, sep=',')
    data1['date'] = pd.to_datetime(data1['time'])
    data1['date']=data1['date'].dt.date
    data2 = data1.drop([ 'time','volume', 'amount', 'open_interest'], axis=1)
    data2.set_index('code', inplace=True)
    states = ['date','open', 'hight', 'low', 'close']
    data2.reindex(columns=states)
    return data2

# 对数据进行分割并保存
def write_csv_data(data,start,code):
    '''
    :param:
    data:需要分割的原始数据
    start:上一次调用所到数据(每次输入时需要修改start的值为上一次输出的end值)
    code:输入需要分割的数据的股票名称
    :return: 根据输入的股票名称截取股票数据所在的列范围数据

    '''
    stock_code=[]
    end=0
    end+=start

    for i in range(end,len(data.index)):
         if data.index[i] == code:
              end+=1
         else:
              continue
    stock_code=data[start:end]
    return end,stock_code



if __name__ == '__main__':
    '''
    stock_data为读取后初步处理数据(未分割)
    stock_code为对数据按照股票代码进行分割后数据
    '''
    # 文件路径
    path1 = '../data/附录一:30支股票行情.csv'

    stock_data = read_csv_shuju(path1)
    print("初步处理数据:")
    print(stock_data)
    # print(len(stock_data.index))
    # print(stock_data.index[69696])


    '''
     # 这里需要手动修改参数,依次分割30支股票
    需要修改的参数:
    start: 修改为上一次输出的end值
    code: 修改为需要分割出来的股票代码
    文件名: 存入csv时的路径文件名
    '''
    end=0
    start=67431
    code='szse.000028'
    stock_code=[]
    end,stock_code=write_csv_data(stock_data,start,code)
    # stock_code.to_csv('./code_data/30.csv')
    print('分割后股票数据:')
    print(stock_code)
    print(stock_code.index)
    print('end:')
    print(end)

Open position function:

def Stock_position_building():   
    '''
    size_code:30支股票的编号
    first_data:存取读取30支股票的利润
    data1:将其股票编号和利润变成一维数组
    data2:将其股票利润排序
    data3:选取前十支股票
    data4:十支优势股票的收市
    ''' 
    size_code = ['002027','300014','002475','000636','002449','600183','000049','002138'
                 ,'300115','600325','000069','600383','600048','001914','601318','600323'
                 ,'002152','000921','002035','000651','002233','002060','002352','002511'
                 ,'002303','002461','600872','600332','000513','000028']
    first_data,first_close = first_cycle_csv()
    first_close_df=pd.Series(data=first_close,index=size_code)
    data1 = pd.Series(data=first_data,index=size_code)
    data2 =data1.rank(method='average',ascending=False).sort_values()
    data3 = data2[0:11]
    data_x = first_close_df.loc[data3.index]
    print(data_x)

    return data_x


def first_cycle_csv():
    """
    first_data:用于存储利益数据
    path:用于地址(有循环)
    cut-up:读取data的前2-4条数据
    close:最后一天闭市的数据
    open:最初开始的数据
    """
    first_data = []
    first_close = []
    for i in range(1,31):
        path = './data_process/code_data/'+str(i)+'.csv'
        data1 = pd.read_csv(path,sep=',')
        cut_up = data1.iloc[1:4]
        open = cut_up.iloc[0,2]
        close = cut_up.iloc[2,5]
        profit  = close - open
        first_data.append(profit)
        first_close.append(close)
    return first_data,first_close

Get weekly data on the top 10 stocks:

def cycle_csv(a):
    '''
    param:  a是切片开始日期
    '''
    date_data=shuzu().dt.date
    power=[]
    state = ['open','close','code']
    open_data=[]
    for i in range(1,31):
        list_data2 = []
        list2_data1=[]
        path = './data_process/code_data/'+str(i)+'.csv'
        data1 = pd.read_csv(path,sep=',')
        data1.set_index(['date'],inplace=True)
        data2 = date_data[a:a+7]
        for j in data2:
            j = str(j)
            list_data2.append(j)
        for k in data1.index:
            k= str(k)
            if k in list_data2:
                list2_data1.append(k)
            else:
                continue
        data_x =data1.loc[list2_data1,state]
        open = data_x.iloc[0,0]
        close = data_x.iloc[len(data_x.index)-1,1]
        power_data= close - open
        power.append(power_data)
        open_data.append(open)
        
    '''
    size_code:30支股票的编号
    first_data:存取读取30支股票的利润
    data1:将其股票编号和利润变成一维数组
    data2:将其股票利润排序
    data3:选取前十支股票
    ''' 
    size_code = ['002027','300014','002475','000636','002449','600183','000049','002138'
                 ,'300115','600325','000069','600383','600048','001914','601318','600323'
                 ,'002152','000921','002035','000651','002233','002060','002352','002511'
                 ,'002303','002461','600872','600332','000513','000028']
    data_1 = pd.Series(data=power,index=size_code)
    data_2 =data_1.rank(method='average',ascending=False).sort_values()
    data_3 = data_2[0:10]
    open_data2=pd.Series(data=open_data,index=size_code)
    # print(open_data2)
    # print(data_3)
    return  data_3,open_data2

Calculate interest rate:

def interest_rate(a,principal,front_stock):
    '''
    param:
    a是切片日期
    principal是设置的本金
    front_stock是上周的10支强势股票
    '''
    if a > 3584:
        stockwinner, stockopen30 = cycle_csv(a)
        opendata = stockopen30.loc[front_stock.index]

    else:
        stockwinner, stockopen30 = cycle_csv(a)
        stockwinner2, stockopen30_2 = cycle_csv(a + 7)
        opendata = stockopen30.loc[front_stock.index]
        opendata1 = stockopen30_2.loc[front_stock.index]
        # print(opendata)
        # print(opendata1)
        powerdata = []  # 用于保存利益
        pricedata = []  # 用于保存购入价格
        for i in range(10):
            stocnum = int((principal / 10) / opendata[i])
            # print(stocnum)
            power = (opendata1[i] - opendata[i]) * stocnum
            price = (stocnum * opendata[i]) - (stocnum * opendata[i]) * (0.000025)
            powerdata.append(power)
            pricedata.append(price)
        # allprice.append(sum(pricedata))
        # allpower.append(sum(powerdata))

    Interest = (sum(powerdata) / sum(pricedata)) * 100
    # print(Interest)

    return Interest


 

        

Guess you like

Origin blog.csdn.net/weixin_52135595/article/details/127494284