For a college project I'm working with the Johns Hopkins Coronavirus COVID-19 dataset: https://github.com/CSSEGISandData/COVID-19. What I am trying is to make the dataset more simple. This is how I have the dataset now:
Country Date Confirmed Deaths Recovered
2600 Mainland China 2020-02-28 410.0 7.0 257.0
2601 Iran 2020-02-28 388.0 34.0 73.0
2602 Mainland China 2020-02-28 337.0 3.0 279.0
2603 Mainland China 2020-02-28 318.0 6.0 277.0
2604 Mainland China 2020-02-28 296.0 1.0 235.0
... ... ... ... ... ...
2695 US 2020-02-25 1.0 0.0 1.0
2696 US 2020-02-24 0.0 0.0 0.0
2697 US 2020-02-24 0.0 0.0 0.0
2698 US 2020-02-24 0.0 0.0 0.0
2699 Mainland China 2020-02-29 66337.0 2727.0 28993.0
I want multiply all the Confirmed, Deaths and Recovered values of rows if the value in Country and Date are the same.
So for instance in the rows 2600, 2602, 2603, 2604 the values in the columns Country and Date are the same so I want to combine these rows and multiply the Confirmed, Deaths and Recovered sepperatly. Which will give the following row:
2600 Mainland China 2020-02-28 1361.0 17.0 1048.0
What I have so far:
duplicateRowsDF = df[df.duplicated(['Country', 'Date'])]
duplicateRowsDF
Hope somebody can help me out, preferably with, but not limited to, Pandas. Thanks in advance.
What about using groupby
? If you do this:
df.groupby(by=['Country', 'Date']).sum()
All your rows with same country and date will be grouped into only one column with the sum of all values in each column.