Visiony10 :
My first question on StackOverflow. Please be good to me :)
Hello, I just started a small project on data science and I wanted to ultimately create a pie chart via matplot showing the percentage of device model on the site's overall traffic (i.e. 30% iPhone, 20% iPad, 10% Mac, etc.).
useragent count
iPhone 11298
Mac 3206
iPad 627
SM-N960F 433
SM-N950F 430
... ...
K330 1
K220 1
SM-J737P 1
SM-J737T1 1
0PFJ50 1
[1991 rows x 2 columns]
From the screenshot, there are 1,991 records. I am preparing the data for plotting and I want to only display the top 5 useragents (top 4 being the devices and the top 5 will be labeled as others and the sum of the remaining items).
The expected output is like this:
useragent count
iPhone 11298
Mac 3206
iPad 627
SM-N960F 433
Others 9000
Thank you so much!
jezrael :
Use:
#first sorting data if necessary
df1 = df.sort_values('count', ascending=False)
#then get top 4 rows
df2 = df1.head(4)
#filter column `count` for all values after 4 rows
summed = df1.loc[df1.index[4:], 'count'].sum()
#create DataFrame by another counts
df3 = pd.DataFrame({'useragent':['Other'], 'count':[summed]})
#join together
df4 = pd.concat([df2, df3], sort=False, ignore_index=True)
print (df4)
useragent count
0 iPhone 11298
1 Mac 3206
2 iPad 627
3 SM-N960F 433
4 Other 435
EDIT:
#filter by threshold
mask = df['count'] > 500
#filtered rows by boolean indexing
df2 = df[mask]
#inverted mask - sum by count
summed = df.loc[~mask, 'count'].sum()
#same like above
df3 = pd.DataFrame({'useragent':['Other'], 'count':[summed]})
df5 = pd.concat([df2, df3], sort=False, ignore_index=True)
print (df5)
useragent count
0 iPhone 11298
1 Mac 3206
2 iPad 627
3 Other 868
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=297953&siteId=1