propagate calculation along pandas dataframe rows - Code World

propagate calculation along pandas dataframe rows

Others 2022-04-20 06:21:37 views: 0

user1403546 :

I need to propagate a calculation (for example a delay) along pandas dataframe rows.

I found a solution which uses .iterrows() method and is very slow, so I was wondering if there's a vectorized solution for this problem since my data is huge.

Here is my approach:

import pandas as pd
import numpy as np
df = pd.DataFrame(index = ['task_1', 'task_2', 'task_3', 'task_4', 'task_5'], columns=['start_time', 'end_time'], data = [[1,2], [3,4], [6,7], [7,8], [10,11] ] )

# set start delay on task 2
start_delay_on_task_2 = 3
df.loc['task_2', 'start_delay'] = start_delay_on_task_2
df['start_delay'].fillna(0, inplace=True)

# compute buffer between tasks
df['buffer_to_next_task'] = df['start_time'].shift(-1) - df['end_time']

here is the content of df:

        start_time  end_time
task_1  1           2
task_2  3           4
task_3  6           7
task_4  7           8
task_5  10          11

and now the worst code ever to compute the overall delay

df['overall_start_delay'] = df['start_delay']
overall_start_delay_idx = df.columns.get_loc('overall_start_delay')
start_delay_idx = df.columns.get_loc('start_delay')
buffer_to_next_task_idx = df.columns.get_loc('buffer_to_next_task')
for i in range(len(df)):
    overall_delay = None
    if list(df.index)[i] <= 'task_2':
        overall_delay = df.iloc[i, start_delay_idx]
    else:
        overall_delay = max(0, df.iloc[i-1, overall_start_delay_idx] - df.iloc[i-1, buffer_to_next_task_idx])
    df.iloc[i, overall_start_delay_idx] = overall_delay

and here the desired result

         start_time end_time start_delay    buffer_to_next_task overall_start_delay
task_1   1          2        0.0            1.0                 0.0
task_2   3          4        3.0            2.0                 3.0
task_3   6          7        0.0            0.0                 1.0
task_4   7          8        0.0            2.0                 1.0
task_5   10         11       0.0            NaN                 0.0

any suggestion about making this code vectorized and avoid the for loop?

Quang Hoang :

This is a solution for one delay:

total_delays = df.start_delay.cumsum()
(total_delays
 .sub(df.buffer_to_next_task
      .where(total_delays.gt(0),0)
      .cumsum().shift(fill_value=0)
     )
   .clip(lower=0)
)

Output:

task_1    0.0
task_2    3.0
task_3    1.0
task_4    1.0
task_5    0.0
dtype: float64

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=10138&siteId=1

propagate calculation along pandas dataframe rows

How to add rows to Pandas DataFrame

Data Analysis [alignment operation study notes day13] + Pandas Series of aligned rows, aligned calculation index + + Series of the alignment operation DataFrame + DataFrame row, the column index + aligned alignment operation DataFrame

pandas gets the number of rows, columns, and elements of the dataframe

drop rows in pandas dataframe that are not meeting the conditions

pandas select rows by condition for all of dataframe columns

Merge multiple rows into one in a pandas dataframe?

62_Pandas conditionally extract rows of pandas.DataFrame

Pandas.DataFrame rows required percentage (logarithmic scale)

Reshape Pandas Dataframe with duplicate Index and fill missing rows

pandas dataframe - How to find consecutive rows that meet some conditions?

pandas: get rows by comparing two columns of dataframe to list of tuples

Pandas how to fill sequence of rows with the previous value in dataframe

Reshape Pandas Dataframe with duplicate Index and fill missing rows

pandas dataframe - How to find consecutive rows that meet some conditions?

[Python Fennel Bean Series] PANDAS Gets the number of rows in DataFrame

How to iterate through selected rows in pandas dataframe with conditions matching three rows?

Pandas library data structure _ calculation 05_DataFrame

Dataframe related operations in pandas: intercept one or several rows, splice, and add a row

Pandas DataFrame how to group (pivot?) rows by values of specified columns, but keeping the original index?

Pandas dataframe get all rows between zero(0) of mask column and get first and last row of each group

Pandas: Can I filter a dataframe to get only rows with a 50% difference between each other?

[Python Fennel Bean Series] How does PANDAS select DataFrame rows based on column values

choosing rows by values in DataFrame

Pandas uses the dropna function to delete all data rows that contain missing values in the dataframe data (default dropna deletes data rows, and rows that contain at least one missing value are deleted)

Как Pandas DataFrame добавляет строку данных?

insert dataframe into rows for each group in another dataframe

Vue along with online learning calculation of property (computed)

Convert dataframe of list in columns to rows

Update dataframe rows trough loop

Recommended

Ranking

#2019110700005

What materials and procedures are required for patent transfer

What is the blockchain Ethereum triplet state root transaction root receipt root

Front-end study notes 04 --- About the insertion of html pictures and videos

Documents required for the filing of WeChat Mini Programs in special industries, the filing process of WeChat Mini Programs in special industries, how to file WeChat Mini Programs in special industries

2017 Qingdao-site tournament I The Squared Mosquito Coil

[BZOJ3165][HEOI2013]Segment (line segment tree without marking)

Kettle series: KettleEasyExpand, an open source Kettle universal plugin by Ma Jinju

The latest tutorial on making framework for iOS

DAX Section 6: Statistical Functions

Daily

More

2024-05-14(9)

2024-05-13(8)

2024-05-12(28)

2024-05-11(32)

2024-05-10(34)

2024-05-09(32)

2024-05-08(18)

2024-05-07(34)

2024-05-06(6)

2024-05-05(0)