I have data looking like this:
Col1
time: 4
1
2
3
time: 7
4
5
6
time: 11
7
8
...
I want to add a new column an make it to look like this:
Col1 Col2
time: 4 4
1 4
2 4
3 4
time: 7 7
4 7
5 7
6 7
time: 11 11
7 11
8 11
... ...
So I want to grab the specific value from the rows "time: x" and put them in the rows of the new column until the next row with "time: x" appears. Any suggestions? I am not even quite sure if the value in the row is an integer or string. I appreciate your help!
You can try something like:
df['Col2']=(df.groupby(df['Col1'].str.contains('time:').cumsum())['Col1'].transform('first')
.str.split(':').str[-1])
print(df)
Col1 Col2
0 time: 4 4
1 1 4
2 2 4
3 3 4
4 time: 7 7
5 4 7
6 5 7
7 6 7
8 time: 11 11
9 7 11
10 8 11
....
....
Explanation:
First we create a helper series which returns True for all the rows having the word time
and then cumulative sum them:
print(df['Col1'].str.contains('time:').cumsum())
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 3
9 3
10 3
Now we can treat this as individual groups so we groupby on this helper series and return the first value:
print(df.groupby(df['Col1'].str.contains('time:').cumsum())['Col1'].transform('first'))
0 time: 4
1 time: 4
2 time: 4
3 time: 4
4 time: 7
5 time: 7
6 time: 7
7 time: 7
8 time: 11
9 time: 11
10 time: 11
Once we have this result , we can chain str.split
which splits the series on :
and return the last element of the split by using .str[-1]
.
Hope that helps.