I recently encountered such a problem,
how to use pandas to aggregate the values according to whether the first few digits are the same. (For example, if the first four digits of a field are the same, a field will be summed separately)
So, first prepare a set of sample data
import pandas as pd
df = pd.DataFrame({
"party_id": [1101910000, 1101910000, 1101910000, 6523930000, 6523930000, 6523930000],
"value": [1, 3, 5, 2, 1, 3]}
)
print(df)
Requirement example: For the data with the same first four digits of the field party_id, sum the field values respectively.
Ideas:
- First, convert the field party_id to string format. This can be achieved using the astype() method of DataFrame.
- Then, use the .str.slice() method to slice by length.
- After slicing, use the groupy() method to aggregate, and then use sum() to sum up to achieve our needs.
df1 = df.astype({
"party_id":str})
df1['party_id'] = df1['party_id'].str.slice(0,4)
print(df1.groupby("party_id").sum())
If you have better ideas, please share them in the comment area!