python中计算DataFrame,Series的数据频率

在pandas里面常用用value_counts确认数据出现的频率。

  • Series 情况下
  1. import numpy as np
  2. import pandas as pd
  3. from pandas import DataFrame
  4. from pandas import Series
  5. ss = Series([ 'Tokyo', 'Nagoya', 'Nagoya', 'Osaka', 'Tokyo', 'Tokyo'])
  6. ss.value_counts() #value_counts 直接用来计算series里面相同数据出现的频率
  1. Tokyo 3
  2. Nagoya 2
  3. Osaka 1
  4. dtype: int64

  • DataFrame 情况下
  1. import numpy as np
  2. import pandas as pd
  3. from pandas import DataFrame
  4. from pandas import Series
  5. df=DataFrame({'a':['Tokyo','Osaka','Nagoya','Osaka','Tokyo','Tokyo'],'b':['Osaka','Osaka','Osaka','Tokyo','Tokyo','Tokyo']}) #DataFrame用来输入两列数据,同时value_counts将每列中相同的数据频率计算出来
  6. print(df)
  1. a b
  2. 0 Tokyo Osaka
  3. 1 Osaka Osaka
  4. 2 Nagoya Osaka
  5. 3 Osaka Tokyo
  6. 4 Tokyo Tokyo
  7. 5 Tokyo Tokyo

  1. df.apply(pd.value_counts)
  2. a b
  3. Nagoya 1 NaN #在b列中meiynagoya,因此是用NaN 表示。
  4. Osaka 2 3.0
  5. Tokyo 3 3.0
参考:
http://ailaby.com/dataframe_value_counts/

猜你喜欢

转载自blog.csdn.net/qq_39521554/article/details/81052311