"Data analysis was performed using the python" pandas basis of reading notes (c)

Overview and calculation of descriptive statistics

pandas equipped with a common mathematical objects. Collection of statistical methods. DataFrame in the following simple example:

import pandas as pd
import numpy as np

df = pd.DataFrame([[1.4,np.nan],[7.1,4.5],
                   [np.nan,np.nan],[0.75,-1.3]],
                  index = list('abcd'),
                  columns=['one','tow'])
#     one  tow
# a  1.40  NaN
# b  7.10  4.5
# c   NaN  NaN
# d  0.75 -1.3

The method can be obtained using the sum and adding in the row, is returned as the Series:

df.sum()
# one    9.25
# tow    3.20
# dtype: float64

If we sum in the column, you can add axis parameters:

df.sum(axis = 1)
# a     1.40
# b    11.60
# c     0.00
# d    -0.55
# dtype: float64

NaN is automatically excluded . Skipna values can be designated to choose not Skip NaN, i.e. as long as there is a NaN, then the result is NaN is False:

df.sum(axis = 1,skipna = False)
# a      NaN
# b    11.60
# c      NaN
# d    -0.55
# dtype: float64

idxmax idxmin methods and each row or index can be obtained for each column the maximum value or the minimum value:

df.idxmax(axis = 1)
# a    one
# b    one
# c    NaN
# d    one

describe methods to produce a plurality of statistics values:

df.describe(())
#             one       tow
# count  3.000000  2.000000
# mean   3.083333  1.600000
# std    3.493685  4.101219
# min    0.750000 -1.300000
# 50%    1.400000  1.600000
# max    7.100000  4.500000

Common descriptive summary statistics and statistical methods:

method description
count The number of non-value NA
describe Statistical summary calculations Series collection of the columns or DataFrame
min,max Computing minimum, maximum
argmin,argmax Minimum value were calculated, where the position of the maximum index (integer)
idxmin,idxmax Calculate the minimum or maximum value index tab is located
quantile From the quantile samples is calculated from 0 to 1
sum Addition sum
mean Means
median Median
mad The average absolute deviation from the mean
prod All product values
where Sample variance values
std Sample standard differential value
skew A sample scale (third time) value
kurt Sample kurtosis (fourth time) value
cumsum Cumulative value
cumin,cummax Maximum and minimum accumulated values
cumpord The cumulative value of the product
diff An arithmetic calculation of the difference (useful for time series)
pct_change Calculate the percentage

One class of methods to extract information from the value contained in the Series, Series object first consider the following:

obj = pd.Series(list('cadaabbcc'))
# 0    c
# 1    a
# 2    d
# 3    a
# 4    a
# 5    b
# 6    b
# 7    c
# 8    c
# dtype: object

The first is the unique method, all the values ​​which can be given object (unique), but not necessarily in the order given by ::

uniques = obj.unique()
# ['c' 'a' 'd' 'b']

value_counts () method may calculate a value occurs in a number of Series:

obj.value_counts()
# a    3
# c    3
# b    2
# d    1
# dtype: int64

The method according to the default number arranged from more to less, you may be added to cancel the alignment sort = False.
isin vectorized specified detection member, i.e., one by determining whether a given element in a set order with Boolean values may be filtered index sets:

obj.isin(['b','c'])
# 0     True
# 1    False
# 2    False
# 3    False
# 4    False
# 5     True
# 6     True
# 7     True
# 8     True
# dtype: bool
obj[obj.isin(['b','c'])]
# 0    c
# 5    b
# 6    b
# 7    c
# 8    c
# dtype: object

Unique value, and counting method set membership properties as follows:

method description
ray Characterization Series calculated values ​​in the array of Boolean values ​​is included in each incoming sequence
match Each integer index value calculation array, forming a single array of values, join and contribute to its data type of operation
unique Series array of unique values ​​calculated values, are returned in the order of observation
value_counts Returns a Series, the sequence index is a unique value, the value is an odd number, sorted in descending order of the number of
Published 14 original articles · won praise 0 · Views 174

Guess you like

Origin blog.csdn.net/pnd237/article/details/104363239