Senior management team stability calculation

    Recently, Jiyou gave me a task: to calculate the stability of the team based on the status of the senior management team.

    There are two algorithms for calculating stability provided by him. The first is based on the methods of Crutchley et al. (2002), Yu Dongzhi and Chi Guohua (2004), using the following formula

Indicates the stability of the team from year t to year t+1. The value range of STMT is [0,1]. The closer to 1, the higher the stability of the team. Represents the total number of executives in office in year t , represents the number of executives who were in office in year t but left (out of office) in year t+1, and represents the number of executives who were not in office in year t but were in office at t+1 (newly added) Number of people .

    The second is slightly modified as follows:

Represents the stability from year t to year t+i. 

Discussed with friends, the definition of some situations is as follows:

  • The definition of " senior executives ": The number of executives counts the number of positions (for example, there may be a situation where one person has multiple jobs, which is regarded as multiple "senior executives"), not the number of employees.
  • The definition of " senior management ": The information he provided to me mentioned that "including chairman, directors, general manager, deputy general managers, etc., excluding external directors and independent directors." What does this "etc." specifically include It’s hard to say, check the data and know that there are many different positions, more than a thousand, including cashier, accountant, factory manager, team leader, etc. may not be considered as executives, but there are too many positions to filter one by one, for convenience , Set senior management as all positions except outside directors and independent directors , including all the positions mentioned above.
  • For the office, new, outgoing define three states: If such and such a departure time was in December 2014, then in 2014 he still is not any of it? After all, I have been in office for a few months in 2014; similar to the new appointment date in November 2015, is 2015 considered in office? So after consulting with base friends, the three statuses are defined as follows: new appointment year, departure year Also counted as in office .
  • Definition of empty or N/A for new appointment time and departure time: my friends suggest that there is time or N/A deleted, but I think the stability of calculation is not directly related to personal resume and tenure time, so If there is a new appointment/resignation time in the annual report, regardless of the new appointment/resignation, it will be regarded as the person in office in the report year .
  • Abnormal data : 1. There are problems that a company has reported many times in multiple time periods within a year, and the content of the report is duplicated, different, or even changed. 2. The departure time with some data is after the reporting time (reporting time), that is, assuming that the reporting time is December 31, 2014, and the departure time is a certain day in 2015, this is understood as expected resignation, but there is a certain A certain person resigned in 2015 in the 2014 report, but there is no time to leave the position in the 2015 report.
The data given for the first time is as follows:
Each has the following attributes:
Attribute name Securities code Statistics deadline Person ID Name Specific duties Start date End date Are you employed term of office
Explanation The unique identifier used to represent the company When the company submits the report Indicates the unique identification of all employees - Name of the position held Corresponding job start time Corresponding job ending time Indicates whether the person is in the position when the report is submitted The term of office from the beginning of the appointment to the statistical deadline,
Ranges 6 digits, not empty Date format, not empty Variable length number, not empty Not empty, may repeat A message has only one post, not empty, but there is a problem of multiple representations Date format, may be empty or N/A Period format, may be empty or N/A 0 (currently not working) or 1 (currently working) The unit is month. Positive integer or empty (when the term start date is empty or N/A)

Since there may be a duplicate name problem, the person ID is used to represent an independent individual. Whether or not the position and term of office can be derived from the previous attributes, you can choose not to use it. However, due to the problem of multiple submissions of data within a year, in order to prevent duplication, the following structure is used to store the results:

Securities Code: Year: Personnel ID: [Current Position Table: [Position 1, Position 2,,...], New Position Table: [Position 1, Position 2,,...], Outgoing Position Table: [Position 1, position 2,,...]]

It is used to indicate the incumbency, new appointment and resignation of a certain employee in a certain year in a certain company. It is used to indicate the detailed status of incumbency, new appointment and resignation each year.

Idea: Read each piece of information to obtain the company’s stock code, statistical deadline, personnel ID, and specific position. If the position is an independent director or outside director, skip directly (continue). If the report year (the year of the statistical deadline is) ) If there is no such position in the company’s employee’s incumbency table, add the incumbent position table; get the start date of the position, if it is not empty or N/A, convert the string to a date format, extract the corresponding year, and in the corresponding year Add the position to the new and incumbent form of the employee of the company (skip if it already exists), and leave the job in the same way. In this way, there is no need to distinguish the relationship between appointment date, new appointment date and report date. After saving, it is easy to get all the numbers of incumbents, outgoings, and new appointments for each company each year, and save them in list M. The two formulas are not much different. This time the second one (SI) is calculated. In the SI of each company for each year (from 2014 to 2019, i is taken as 1 in the formula), the number of incumbents in that year is expressed as Mt, the number of resignations in that year is expressed as , and the number of incumbents in the next year is expressed as Mt+1, The number of new additions in the next year is expressed as .

import xlrd
import datetime
import xlwt
READ=True
PREPROCESS=True
CAL=True
WRITE=True
if READ:
    table=xlrd.open_workbook('高管团队任职情况.xlsx')
    t=table.sheet_by_index(0)
    
N=t.nrows
#计算在t年在任在第t+1年离任的人数  没用到
#计算第t年的高管总人数
#计算第t年不在任 第t+1年新任的人数
#股票代码用string 存储
position_out=['外部董事','独立董事']
M=dict() #公司每年的在任/离任/新任人数  格式: 公司号(股票代号):年份:[在任人数,离任,新任]
SI=dict()
#同年离任/新任也算在任
#数据出现问题 

if PREPROCESS:
    Detail=dict()#详细表 公司号:年份:员工id:{在任:[],离任:[].新任:[]}
    for l in range(3,N):#第3行开始为数据
        stkcd=str(int(t.cell_value(l,0)))
        Reptdt=str(t.cell_value(l,1))#统计截止日期
        PersonID=str(int(t.cell_value(l,2)))
        Position=t.cell_value(l,4)
        if Position in position_out:
            #不在其位,不谋其政
            continue
        StartDate=str(t.cell_value(l,5))
        EndDate=str(t.cell_value(l,6))
        Tenure_value=str(t.cell_value(l,8))#没用到
        ReptYear=Reptdt[:4]#取字符串的前四位,表示年份
        if stkcd not in Detail:#如果不存在则创建
            Detail[stkcd]=dict()
        if ReptYear not in Detail[stkcd]:
            Detail[stkcd][ReptYear]=dict()
        if PersonID not in Detail[stkcd][ReptYear]:
            Detail[stkcd][ReptYear][PersonID]={0:[],-1:[],1:[]}#0:在任,1:新任,-1:离任
        ReptDateTime=datetime.datetime.strptime(Reptdt,'%Y-%m-%d')
        if Position not in Detail[stkcd][ReptYear][PersonID][0]:
            Detail[stkcd][ReptYear][PersonID][0].append(Position)#不在表内则添加
        if len(StartDate)>4:#如果长度大于4(不为空或N/A)
            StartDateTime=datetime.datetime.strptime(StartDate,'%Y-%m-%d')
            if str(StartDateTime.year) not in Detail[stkcd]:
                Detail[stkcd][str(StartDateTime.year)]=dict()
            if PersonID not in Detail[stkcd][str(StartDateTime.year)]:
                Detail[stkcd][str(StartDateTime.year)][PersonID]={0:[],-1:[],1:[]}
            if Position not in Detail[stkcd][str(StartDateTime.year)][PersonID][1]:#新任
                Detail[stkcd][str(StartDateTime.year)][PersonID][1].append(Position)
            if Position not in Detail[stkcd][str(StartDateTime.year)][PersonID][0]:#在任
                Detail[stkcd][str(StartDateTime.year)][PersonID][0].append(Position)
        if len(EndDate)>4:#离任同理
            EndDateTime=datetime.datetime.strptime(EndDate,'%Y-%m-%d')
            if str(EndDateTime.year) not in Detail[stkcd]:
                Detail[stkcd][str(EndDateTime.year)]=dict()
            if PersonID not in Detail[stkcd][str(EndDateTime.year)]:
                Detail[stkcd][str(EndDateTime.year)][PersonID]={0:[],-1:[],1:[]}
            if Position not in Detail[stkcd][str(EndDateTime.year)][PersonID][-1]:
                Detail[stkcd][str(EndDateTime.year)][PersonID][-1].append(Position)
            if Position not in Detail[stkcd][str(EndDateTime.year)][PersonID][0]:
                Detail[stkcd][str(EndDateTime.year)][PersonID][0].append(Position)
if CAL:
    for stkcd in Detail:
        M[stkcd]=dict()
        for year in Detail[stkcd]:
            M[stkcd][year]=[0,0,0]#初始化
            for PersonID in Detail[stkcd][year]:
                M[stkcd][year][0]+=len(Detail[stkcd][year][PersonID][0])#统计
                M[stkcd][year][1]+=len(Detail[stkcd][year][PersonID][1])
                M[stkcd][year][2]+=len(Detail[stkcd][year][PersonID][-1])
    
    for stkcd in M:
        SI[stkcd]=dict()
        for year in M[stkcd]:
            if int(year)<=2013 or int(year)>=2019:#指定年份
                continue
            NextYear=str(int(year)+1)
            Mj=M[stkcd][year][0]
            if NextYear not in M[stkcd]:
                SI[stkcd][year]=0
                continue
            Mjp1=M[stkcd][NextYear][0]
            S1=M[stkcd][year][2]   #在j年是 但j+1不是 计算离任
            S2=M[stkcd][NextYear][1] #在j年不是但j+1是 计算新任
            SI[stkcd][year]=(Mj-S1)/Mj*(Mjp1)/(Mj+Mjp1)+(Mjp1-S2)/Mjp1*Mj/(Mj+Mjp1)

if WRITE:
    workspace=xlwt.Workbook(encoding='ascii')
    excel=workspace.add_sheet('sheet1',cell_overwrite_ok=True)#添加第一张表
    excel.write(0,0,'证券代码')
    excel.write(0,1,'2014-2015')
    excel.write(0,2,'2015-2016')
    excel.write(0,3,'2016-2017')
    excel.write(0,4,'2017-2018')
    excel.write(0,5,'2018-2019')
    c=1
    for item in SI:
        excel.write(c,0,item)
        for year in range(2014,2019):
            if str(year) in SI[item]:
                excel.write(c,year-2013,SI[item][str(year)])
            else:#如果没有该年的数据,则写入0
                excel.write(c,year-2013,'0')
        c=c+1
    workspace.save('answer.xls')   

Then my friend didn't know where I got the code of a guy from Zhejiang University. Use pandas to directly operate on the entire table: (He seems to not count the year he left as the same year, and he uses the stmt algorithm, I use IS, so the two results are different)

# -*- coding: utf-8 -*-
"""
Created on Fri May 15 17:03:55 2020

@author: hp
"""

# -*- coding: utf-8 -*-

import numpy as np
import pandas as pd

df=pd.read_excel("Executives.xlsx")
df.columns=['Stkcd', 'Reptdt', 'Name', 'Position_type', 'Position', 'StartDate', 'EndDate','Note']
#有的列名是乱的代码要修改一下
df.drop([0,1],axis=0,inplace=True)
df.head()

#预处理
df['Reptdt']=pd.to_datetime(df['Reptdt'])
df['re_year']=df['Reptdt'].map(lambda x:x.year)

#将一行内的不同入职、离职日期合并
def min_date(x):
    if x is None:
        return None
    else:
        c=np.array(x.split(',')).astype(str)
        return np.min(pd.to_datetime(c))
def max_date(x):
    if x is None:
        return None
    else:
        c=np.array(x.split(',')).astype(str)
        return np.max(pd.to_datetime(c))
df['StartDate']=df['StartDate'].astype(str).map(min_date)
df['EndDate']=df['EndDate'].astype(str).map(max_date)
df['Reptdt2']=df['Reptdt']

import datetime
#由于统计时间的不同,同一个人会出现多次,把这多条记录合并
data=df.groupby(['Stkcd','Name']).agg({'Reptdt2':'max','Reptdt':'min',
                                                    'StartDate':'min','EndDate':'max'}).reset_index()

data['EndDate']=data['EndDate'].fillna(pd.to_datetime('2020-01-01'))
data['StartDate']=data['StartDate'].fillna(data['StartDate'].min())
#这里再进行缺失日期的填补,注意这里可能引入误差
data['st_year']=data['StartDate'].map(lambda x:x.year)
data['end_year']=data['EndDate'].map(lambda x:x.year)
data['re_min']=data['Reptdt'].map(lambda x:x.year)
data['re_max']=data['Reptdt2'].map(lambda x:x.year)


#计算部分
def stmt(df):
    #先处理年份,从统计年份的最小年算到最大年
    y1=df['re_min'].min()
    y2=df['re_max'].max()
    if y1==y2:
        return None
    stmt_list=[]
    
    for i in range(y1,y2):#对每一年计算stmt
        #入职时间小于等于当年,离职时间大于当年的记为在位(或入职离职均为当年的)
        da=((df['st_year']<=i)&(df['end_year']>i))|((df['st_year']==i)&(df['end_year']==i))
        m1=np.sum(da)
        da=((df['st_year']<=i+1)&(df['end_year']>i+1))|((df['st_year']==i+1)&(df['end_year']==i+1))
        m2=np.sum(da)
        da=(df['st_year']<=i)&(df['end_year']==i+1)
        s1=np.sum(da)#i年在位但下一年离任
        da=(df['st_year']==i+1)
        s2=np.sum(da)#i+1年新入职
        
        if (m1==0)|(m2==0):
            stmt_list.append(0)
        else:
            stmt12=(m1-s1)/m1*m2/(m1+m2)+(m2-s2)/m2*m1/(m1+m2)
            stmt_list.append(stmt12)
    return pd.DataFrame({'year':range(y1,y2),'stmt':stmt_list})

df_result=data.groupby(['Stkcd']).apply(stmt).reset_index()

df_result.drop(['level_1'],axis=1,inplace=True)
df_result['Stkcd']=df_result['Stkcd'].astype(str)
df_result.to_excel("stmt_result.xlsx")

Later, because the first version of the data exists for multiple submissions within a year, and the data in 2019 is relatively small, many of the values ​​are 0, and the job information is confused, and then I was given the new version of the data. In this version of the data, the company only Submit data at the end of each year (December 31), and the names, start time, and end time of multiple positions of the same person are displayed in one message (one line). As shown below

(The picture upload hangs, maybe the server has a small problem)

 

Compared with the first data, the data replaces the original personnel number with the name, and cancels the term of office and whether or not they are in office. The processing method is similar to the first one, that is, after the position is divided into character strings, the corresponding start time and end time are obtained in turn, and the new, incumbent, and outgoing data can be added.

import xlrd
import datetime
import xlwt
READ=True
PREPROCESS=True
CAL=True
WRITE=True
if READ:
    table=xlrd.open_workbook('高管团队稳定性指标11(1).xlsx')
    t=table.sheet_by_index(0)
    
N=t.nrows
#计算在t年在任在第t+1年离任的人数  没用到
#计算第t年的高管总人数
#计算第t年不在任 第t+1年新任的人数
#股票代码用string 存储
position_out=['外部董事','独立董事']
M=dict() #公司每年的在任/离任/新任人数  格式: 公司号(股票代号):年份:[在任人数,离任,新任]
SI=dict()
#同年离任/新任也算在任
#数据出现问题 

if PREPROCESS:
    Detail=dict()#详细表 公司号:年份:员工id:{在任:[],离任:[].新任:[]}
    for l in range(3,N):
        stkcd=str(int(t.cell_value(l,0)))
        Reptdt=str(t.cell_value(l,1))#统计截止日期
        PersonID=(t.cell_value(l,2))
        Positions=t.cell_value(l,4)
        StartDates=str(t.cell_value(l,5))
        EndDates=str(t.cell_value(l,6))  
        PositionList=Positions.split(',')#由于分割符单一,可以直接使用str.split代替re.split
        StartDateList=StartDates.split(',')
        EndDateList=EndDates.split(',')
        ReptDateTime=datetime.datetime.strptime(Reptdt,'%Y-%m-%d')
        Year=str(ReptDateTime.year)
        if stkcd not in Detail:
            Detail[stkcd]=dict()
        if Year not in Detail[stkcd]:
            Detail[stkcd][Year]=dict()
        if PersonID not in Detail[stkcd][Year]:
            Detail[stkcd][Year][PersonID]={0:[],-1:[],1:[]}
        for Position in PositionList:   
            if Position in position_out:
            #不在其位,不谋其政
                continue
            Detail[stkcd][Year][PersonID][0].append(Position)
            if len(StartDates)>=6:
                if PositionList.index(Position)<len(StartDateList):
                    StartDate=StartDateList[PositionList.index(Position)]
                else:
                    continue
                   # StartDate=StartDateList[0]
                if len(StartDate)>4:
                    StartDateTime=datetime.datetime.strptime(StartDate,'%Y-%m-%d')
                    StartYear=str(StartDateTime.year)
                    if StartYear not in Detail[stkcd]:
                        Detail[stkcd][StartYear]=dict()
                    if PersonID not in Detail[stkcd][StartYear]:
                        Detail[stkcd][StartYear][PersonID]={0:[],-1:[],1:[]}
                    if Position not in Detail[stkcd][StartYear][PersonID][1]:
                        Detail[stkcd][StartYear][PersonID][1].append(Position)
                    if Position not in Detail[stkcd][StartYear][PersonID][0]:
                        Detail[stkcd][StartYear][PersonID][0].append(Position)
            if len(EndDates)>=6:  
                if PositionList.index(Position)<len(EndDateList):
                    EndDate=EndDateList[PositionList.index(Position)]
                else:
                    continue
                   # EndDate=EndDateList[0]
                if len(EndDate)>4:
                    EndDateTime=datetime.datetime.strptime(EndDate,'%Y-%m-%d')
                    EndYear=str(EndDateTime.year)
                    if EndYear not in Detail[stkcd]:
                        Detail[stkcd][EndYear]=dict()
                    if PersonID not in Detail[stkcd][EndYear]:
                        Detail[stkcd][EndYear][PersonID]={0:[],-1:[],1:[]}
                    if Position not in Detail[stkcd][EndYear][PersonID][1]:
                        Detail[stkcd][EndYear][PersonID][1].append(Position)
                    if Position not in Detail[stkcd][EndYear][PersonID][0]:
                        Detail[stkcd][EndYear][PersonID][0].append(Position)

if CAL:
    for stkcd in Detail:
        M[stkcd]=dict()
        for year in Detail[stkcd]:
            M[stkcd][year]=[0,0,0]
            for PersonID in Detail[stkcd][year]:
                M[stkcd][year][0]+=len(Detail[stkcd][year][PersonID][0])
                M[stkcd][year][1]+=len(Detail[stkcd][year][PersonID][1])
                M[stkcd][year][2]+=len(Detail[stkcd][year][PersonID][-1])
    
    for stkcd in M:
        SI[stkcd]=dict()
        for year in M[stkcd]:
            if int(year)<=2013 or int(year)>=2019:
                continue
            NextYear=str(int(year)+1)
            Mj=M[stkcd][year][0]
            if NextYear not in M[stkcd]:
                SI[stkcd][year]=0
                continue
            Mjp1=M[stkcd][NextYear][0]
            S1=M[stkcd][year][2]   #在j年是 但j+1不是 计算离任
            S2=M[stkcd][NextYear][1] #在j年不是但j+1是 计算新任
            SI[stkcd][year]=(Mj-S1)/Mj*(Mjp1)/(Mj+Mjp1)+(Mjp1-S2)/Mjp1*Mj/(Mj+Mjp1)

if WRITE:
    workspace=xlwt.Workbook(encoding='ascii')
    excel=workspace.add_sheet('sheet1',cell_overwrite_ok=True)#添加第一张表
    excel.write(0,0,'证券代码')
    excel.write(0,1,'2014-2015')
    excel.write(0,2,'2015-2016')
    excel.write(0,3,'2016-2017')
    excel.write(0,4,'2017-2018')
    excel.write(0,5,'2018-2019')
    c=1
    for item in SI:
        excel.write(c,0,item)
        for year in range(2014,2019):
            if str(year) in SI[item]:
                excel.write(c,year-2013,SI[item][str(year)])
            else:
                excel.write(c,year-2013,'-1')
        c=c+1
    workspace.save('answer_new_data_3.xls')   

Another person’s code for the data (I didn’t run it)

# -*- coding: utf-8 -*-
"""
Created on Fri May 15 17:03:55 2020

@author: hp
"""

# -*- coding: utf-8 -*-

import numpy as np
import pandas as pd

df=pd.read_excel("Executives.xlsx")
df.columns=['Stkcd', 'Reptdt', 'Name', 'Position_type', 'Position', 'StartDate', 'EndDate','Note']
#有的列名是乱的代码要修改一下
df.drop([0,1],axis=0,inplace=True)
df.head()

#预处理
df['Reptdt']=pd.to_datetime(df['Reptdt'])
df['re_year']=df['Reptdt'].map(lambda x:x.year)

#将一行内的不同入职、离职日期合并
def min_date(x):
    if x is None:
        return None
    else:
        c=np.array(x.split(',')).astype(str)
        return np.min(pd.to_datetime(c))
def max_date(x):
    if x is None:
        return None
    else:
        c=np.array(x.split(',')).astype(str)
        return np.max(pd.to_datetime(c))
df['StartDate']=df['StartDate'].astype(str).map(min_date)
df['EndDate']=df['EndDate'].astype(str).map(max_date)
df['Reptdt2']=df['Reptdt']

import datetime
#由于统计时间的不同,同一个人会出现多次,把这多条记录合并
data=df.groupby(['Stkcd','Name']).agg({'Reptdt2':'max','Reptdt':'min',
                                                    'StartDate':'min','EndDate':'max'}).reset_index()

data['EndDate']=data['EndDate'].fillna(pd.to_datetime('2020-01-01'))
data['StartDate']=data['StartDate'].fillna(data['StartDate'].min())
#这里再进行缺失日期的填补,注意这里可能引入误差
data['st_year']=data['StartDate'].map(lambda x:x.year)
data['end_year']=data['EndDate'].map(lambda x:x.year)
data['re_min']=data['Reptdt'].map(lambda x:x.year)
data['re_max']=data['Reptdt2'].map(lambda x:x.year)


#计算部分
def stmt(df):
    #先处理年份,从统计年份的最小年算到最大年
    y1=df['re_min'].min()
    y2=df['re_max'].max()
    if y1==y2:
        return None
    stmt_list=[]
    
    for i in range(y1,y2):#对每一年计算stmt
        #入职时间小于等于当年,离职时间大于当年的记为在位(或入职离职均为当年的)
        da=((df['st_year']<=i)&(df['end_year']>i))|((df['st_year']==i)&(df['end_year']==i))
        m1=np.sum(da)
        da=((df['st_year']<=i+1)&(df['end_year']>i+1))|((df['st_year']==i+1)&(df['end_year']==i+1))
        m2=np.sum(da)
        da=(df['st_year']<=i)&(df['end_year']==i+1)
        s1=np.sum(da)#i年在位但下一年离任
        da=(df['st_year']==i+1)
        s2=np.sum(da)#i+1年新入职
        
        if (m1==0)|(m2==0):
            stmt_list.append(0)
        else:
            stmt12=(m1-s1)/m1*m2/(m1+m2)+(m2-s2)/m2*m1/(m1+m2)
            stmt_list.append(stmt12)
    return pd.DataFrame({'year':range(y1,y2),'stmt':stmt_list})

df_result=data.groupby(['Stkcd']).apply(stmt).reset_index()

df_result.drop(['level_1'],axis=1,inplace=True)
df_result['Stkcd']=df_result['Stkcd'].astype(str)
df_result.to_excel("stmt_result.xlsx")

Summary: I feel that using the pandas package to operate on the entire table can be much faster. Although there is a habit of using xlwt/xlrd to read and operate one by one, it must also be studied properly and simplified a lot of redundancy. You can learn and use it properly /

Guess you like

Origin blog.csdn.net/qq_36614557/article/details/106177096