How does pandas implement the aggregation of values according to whether the first few digits are the same? (python)

I recently encountered such a problem,
how to use pandas to aggregate the values ​​according to whether the first few digits are the same. (For example, if the first four digits of a field are the same, a field will be summed separately)


So, first prepare a set of sample data

                insert image description here

import pandas as pd
df = pd.DataFrame({
    
    "party_id": [1101910000, 1101910000, 1101910000, 6523930000, 6523930000, 6523930000],
                   "value": [1, 3, 5, 2, 1, 3]}
)
print(df)

insert image description here


Requirement example: For the data with the same first four digits of the field party_id, sum the field values ​​respectively.

Ideas:

  • First, convert the field party_id to string format. This can be achieved using the astype() method of DataFrame.
  • Then, use the .str.slice() method to slice by length.
  • After slicing, use the groupy() method to aggregate, and then use sum() to sum up to achieve our needs.
df1 = df.astype({
    
    "party_id":str})
df1['party_id'] = df1['party_id'].str.slice(0,4)
print(df1.groupby("party_id").sum())

insert image description here


If you have better ideas, please share them in the comment area!

Guess you like

Origin blog.csdn.net/weixin_48964486/article/details/123622284