Tools/materials:
"""jupyter notebook
Python3"""
Statistical calculation function exercise
import pandas as pd
import numpy as np
1. Create a DataFrame object with 3 rows and 5 columns of random integers, ranging from 0-9
df1 = pd.DataFrame(np.random.randint(0,10,(3,5)))
df1
|
0 |
1 |
2 |
3 |
4 |
0 |
9 |
6 |
5 |
7 |
6 |
1 |
4 |
5 |
4 |
7 |
5 |
2 |
9 |
1 |
9 |
1 |
7 |
2. Count the maximum, minimum, cumulative sum of each row by row
df1.max(axis=1)
0 9
1 7
2 9
dtype: int32
df1.min(axis=1)
0 5
1 4
2 1
dtype: int32
df1.mean(axis=1)
0 6.6
1 5.0
2 5.4
dtype: float64
3. Count the sum, average, and cumulative product of each column by column
Note: The parameters are calculated by column by default
df1.sum()
0 22
1 12
2 18
3 15
4 18
dtype: int64
df1.mean()
0 7.333333
1 4.000000
2 6.000000
3 5.000000
4 6.000000
dtype: float64
df1.cumprod()
|
0 |
1 |
2 |
3 |
4 |
0 |
9 |
6 |
5 |
7 |
6 |
1 |
36 |
30 |
20 |
49 |
30 |
2 |
324 |
30 |
180 |
49 |
210 |
4. Use describe to output multiple statistical functions
df1.describe()
|
0 |
1 |
2 |
3 |
4 |
count |
3.000000 |
3.000000 |
3.000000 |
3.000000 |
3.0 |
mean |
7.333333 |
4.000000 |
6.000000 |
5.000000 |
6.0 |
std |
2.886751 |
2.645751 |
2.645751 |
3.464102 |
1.0 |
min |
4.000000 |
1.000000 |
4.000000 |
1.000000 |
5.0 |
25% |
6.500000 |
3.000000 |
4.500000 |
4.000000 |
5.5 |
50% |
9.000000 |
5.000000 |
5.000000 |
7.000000 |
6.0 |
75% |
9.000000 |
5.500000 |
7.000000 |
7.000000 |
6.5 |
max |
9.000000 |
6.000000 |
9.000000 |
7.000000 |
7.0 |