pandas row transfer columns Solution A typical report output

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/stone0823/article/details/99716757

Work often need to transfer the data row column processing, the output data as a report, similar to the SQL statement in the case when processing. Data in FIG from the simplification of the actual processing data, the source data format is as follows:

Table data of a financial-related program, ENTITY represents corporate entity, ACCOUNT are accounts, S / H indicates lenders, PROJET_TYPE project type, PROJECT_CODE project code. Source data I have on github.

According to need an accounting period, such as in April 2018, reflecting the month's investment costs (140,401 subjects) and the fair value (140,404 subjects) of the changes:

First, using the read_csv()method of reading data into the DataFrame:

import pandas as pd
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas%20sample%20data/project-listing.csv')

If now report prepared in April 2018, first obtain the beginning balance of 2018. Subjects 140,401 original investment account, change the value of the investment is 140404 subjects. Conditions for access are: Account for the year of 140,401 and less than 2018.

We need to introduce the numpy.where()function. The function syntax is:

out = numpy.where(condition[, x, y])

x, y array-like data structure when condtion is True, returns x, otherwise y. Such as the following examples:

>>> a = np.random.randint(1,10,8).reshape(2,4)
>>> b = np.random.randint(1,10,8).reshape(2,4)
>>> a
array([[6, 8, 8, 8],
       [1, 3, 9, 2]])
>>> b
array([[9, 7, 6, 8],
       [7, 8, 2, 7]])
>>> np.where(True, a+2, b+2)
array([[ 8, 10, 10, 10],
       [ 3, 5, 11, 4]])

a and b are a 2 x 4 array, when the condition is True, the return value is an array, the array is a value of each element + 2.

Now data df to carry out the exercise. We know DataFrame each column data type pandas.core.series.Series, Seriesis precisely the type ndarray(One-dimensional ndarray with axis labels (including time series), so you can use numpy.where()methods of treatment:

account = df['ACCOUNT']
txyear = df['YEAR']
amount = df['AMOUNT']
begin_cost = np.where((account==140401) & (txyear<2018), amount, 0)

Show it begin_cost:

array([0.00000000e+00, 3.24717000e+08, 7.60000000e+06, 2.44102600e+05,
       1.00000000e+06, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.93538440e+06,
       8.84261079e+07, 3.90906100e+05, 7.90258530e+06, 0.00000000e+00,
       0.00000000e+00])

Note that in the numpy.where()function, when the conditions, to each condition in parentheses. We use this to increase a range of df:

df['BEGIN_COST'] = np.where(
    (df['ACCOUNT']==140401) & (df['YEAR']<2018), df['AMOUNT'], 0
)

Showed an increase in jupyter notebook in the BEGIN_COSTData Frame after columns:


Other columns in the same manner, the code posted here:

df['BEGIN_COST'] = np.where(
    (df['ACCOUNT']==140401) & (df['YEAR']<2018), df['AMOUNT'], 0
)

df['BEGIN_VAR'] = np.where(
    (df['ACCOUNT']==140404) & (df['YEAR'] <2018), df['AMOUNT'], 0)

df['PER_COST_ADD'] = np.where(
    (df['ACCOUNT']==140401) & (df['YEAR']==2018) & (df['MONTH']<=4) & (df['DIRECTION']=='S'),
    df['AMOUNT'], 0
)

df['PER_VAR_ADD'] = np.where(
    (df['ACCOUNT']==140404) & (df['YEAR']==2018) & (df['MONTH']<=4) & (df['DIRECTION']=='S'),
    df['AMOUNT'], 0
)

df['PER_COST_DECT'] = np.where(
    (df['ACCOUNT']==140401) & (df['YEAR']==2018) & (df['MONTH']<=4) & (df['DIRECTION']=='H'),
    df['AMOUNT'], 0
)

df['PER_VAR_DECT'] = np.where(
    (df['ACCOUNT']==140404) & (df['YEAR']==2018) & (df['MONTH']<=4) & (df['DIRECTION']=='H'),
    df['AMOUNT'], 0
)

prj_summarized = df[['PROJ_CODE', 'BEGIN_COST', 'BEGIN_VAR', 'PER_COST_ADD', 'PER_VAR_ADD', 'PER_COST_DECT', 'PER_VAR_DECT']].groupby('PROJ_CODE').sum()

Guess you like

Origin blog.csdn.net/stone0823/article/details/99716757