Split spreadsheet into pieces based on the sequences of numbers

ZheniaMagic :

I have a dataset in spreadsheets, which is basically the data about every trip of the train in New York Subway.

╔═══════╦══════╦══════════════╦════════════════╦═════════╦═══════════════╦══════════════════╗
║ trip  ║  id  ║ arrival_time ║ departure_time ║ stop_id ║ stop_sequence ║     Station      ║
╠═══════╬══════╬══════════════╬════════════════╬═════════╬═══════════════╬══════════════════╣
║ GO505 ║ 20_2 ║ 0:06:00      ║ 0:06:00        ║     237 ║             1 ║ Penn Station     ║
║ GO505 ║ 20_2 ║ 0:18:00      ║ 0:18:00        ║     214 ║             2 ║ Woodside         ║
║ GO505 ║ 20_2 ║ 0:23:00      ║ 0:23:00        ║      55 ║             3 ║ Forest Hills     ║
║ GO505 ║ 20_2 ║ 0:25:00      ║ 0:25:00        ║     107 ║             4 ║ Kew Gardens      ║
║ GO505 ║ 20_2 ║ 0:29:00      ║ 0:32:00        ║     102 ║             5 ║ Jamaica          ║
║ GO505 ║ 20_2 ║ 0:47:00      ║ 0:47:00        ║     183 ║             6 ║ Rockville Centre ║
║ GO505 ║ 20_2 ║ 0:50:00      ║ 0:50:00        ║     225 ║             7 ║ Baldwin          ║
║ GO505 ║ 20_2 ║ 0:53:00      ║ 0:53:00        ║      64 ║             8 ║ Freeport         ║
║ GO505 ║ 20_2 ║ 0:56:00      ║ 0:56:00        ║     226 ║             9 ║ Merrick          ║
║ GO505 ║ 20_2 ║ 0:59:00      ║ 0:59:00        ║      16 ║            10 ║ Bellmore         ║
║ GO505 ║ 20_2 ║ 1:02:00      ║ 1:02:00        ║     215 ║            11 ║ Wantagh          ║
║ GO505 ║ 20_2 ║ 1:05:00      ║ 1:05:00        ║     187 ║            12 ║ Seaford          ║
║ GO505 ║ 20_2 ║ 1:07:00      ║ 1:07:00        ║     136 ║            13 ║ Massapequa       ║
║ GO505 ║ 20_2 ║ 1:09:00      ║ 1:09:00        ║     135 ║            14 ║ Massapequa Park  ║
║ GO505 ║ 20_2 ║ 1:12:00      ║ 1:12:00        ║       8 ║            15 ║ Amityville       ║
║ GO505 ║ 20_2 ║ 1:15:00      ║ 1:15:00        ║      38 ║            16 ║ Copiague         ║
║ GO505 ║ 20_2 ║ 1:18:00      ║ 1:18:00        ║     117 ║            17 ║ Lindenhurst      ║
║ GO505 ║ 20_2 ║ 1:23:00      ║ 1:23:00        ║      27 ║            18 ║ Babylon          ║
║ GO505 ║ 20_3 ║ 1:00:00      ║ 1:00:00        ║      27 ║             1 ║ Babylon          ║
║ GO505 ║ 20_3 ║ 1:05:00      ║ 1:05:00        ║     117 ║             2 ║ Lindenhurst      ║
║ GO505 ║ 20_3 ║ 1:08:00      ║ 1:08:00        ║      38 ║             3 ║ Copiague         ║
║ GO505 ║ 20_3 ║ 1:10:00      ║ 1:10:00        ║       8 ║             4 ║ Amityville       ║
║ GO505 ║ 20_3 ║ 1:13:00      ║ 1:13:00        ║     135 ║             5 ║ Massapequa Park  ║
╚═══════╩══════╩══════════════╩════════════════╩═════════╩═══════════════╩══════════════════╝

I need to split it somehow into the parts based on the sequences in stop_sequence. Each sequence from 1 to n (here, 18) means 1 trip of the train. So, for example, I need to count the time of each trip (which is departure_time of each last stop_sequence - arrival_time of the first stop_sequence) for each trip (There are about 5,000 of them). How can I somehow do it? I wish I could split the column in python with pandas into several trips and calculate the time for each trip. But I do not know how to do it.

My expected output is

trip id ║ Duration of the trip

GO505 20_2 ║ x:xx:xx

GO505 20_3 ║ x:xx:xx

I am new in data science. Please help!

Theza :

Range A:G -> the data about every trip of the train

Cell I1:=QUERY({ArrayFormula(A:A&" "&B:B),ArrayFormula(VALUE(C:D))},"select Col1,max(Col3)-min(Col2) where Col1!=' ' group by Col1 label max(Col3)-min(Col2) 'Duration of the trip' format max(Col3)-min(Col2) 'hh:mm:ss'")

enter image description here

Function References

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=170493&siteId=1