Python手册(Machine Learning)--statsmodels(ANOVA)

ANOVA


Python手册(Machine Learning)–statsmodels(GettingStarted)
Python手册(Machine Learning)–statsmodels(Regression)
Python手册(Machine Learning)–statsmodels(ANOVA)
Python手册(Machine Learning)–statsmodels(Tables+Imputation)
Python手册(Machine Learning)–statsmodels(MultivariateStatistics)
Python手册(Machine Learning)–statsmodels(TimeSeries)
Python手册(Machine Learning)–statsmodels(Survival)
Python手册(Machine Learning)–statsmodels(Graphics)


方差分析

http://www.statsmodels.org/stable/anova.html

方差分析(Analysis of Variance,简称ANOVA),又称“变异数分析”,为数据分析中常见的统计模型,主要为探讨连续型(Continuous)因变量(Dependent variable)与类别型自变量(Independent variable)的关系。当自变量的因子等于或超过三个类别时,检验各类别平均值是否相等,采用方差分析。
广义t检验中,方差相等(Equality of variance)的合并t检验(Pooled T-test)视为是方差分析的一种。t检验分析两组平均数是否相等,方差分析也采用相同的计算概念,实际上,当方差分析套用在合并t检验的分析上时,产生的F值则会等于t检验的平方项。

总偏差平方和 S S t = S S b + S S w SSt = SSb + SSw

statsmodels包含anova_lm模型,用于使用线性OLSModel进行方差分析,和AnovaRM模型,用于重复测量方差分析(包含平衡数据方差分析)。

Module Reference desc
anova_lm(*args, **kwargs) Anova table for one or more fitted linear models
AnovaRM(data, depvar, subject[, within, …]) Repeated measures Anova using least squares regression
In [1]: import statsmodels.api as sm
In [2]: from statsmodels.formula.api import ols
In [3]: moore = sm.datasets.get_rdataset("Moore", "car",
   ...:                                  cache=True) # load data
In [4]: data = moore.data
In [5]: data = data.rename(columns={"partner.status":
   ...:                             "partner_status"}) # make name pythonic
In [6]: moore_lm = ols('conformity ~ C(fcategory, Sum)*C(partner_status, Sum)',
   ...:                 data=data).fit()
In [7]: table = sm.stats.anova_lm(moore_lm, typ=2) # Type 2 ANOVA DataFrame
In [8]: print(table)
                                              sum_sq    df          F  \
C(fcategory, Sum)                          11.614700   2.0   0.276958   
C(partner_status, Sum)                    212.213778   1.0  10.120692   
C(fcategory, Sum):C(partner_status, Sum)  175.488928   2.0   4.184623   
Residual                                  817.763961  39.0        NaN   

                                            PR(>F)  
C(fcategory, Sum)                         0.759564  
C(partner_status, Sum)                    0.002874  
C(fcategory, Sum):C(partner_status, Sum)  0.022572  
Residual                                       NaN  

猜你喜欢

转载自blog.csdn.net/qq_41518277/article/details/85101240