SPSS- return

1, one dollar return

 

 

A linear regression analysis, multiple linear regression analysis 

[A] linear regression analysis 
have a variable value, if you want to use it to obtain the predicted value of another variable
from variables or predictor variables, the dependent variable or a standard variable
1 Objective: According to an variable values obtained from the predicted value of the dependent variable
2 required data: the dependent variable (continuous variable) + independent variable (continuous variables, dichotomous)
3 assumptions:. . a separate observations b obey the normal two variables. distribution: the overall value of each variable must be normally distributed, but also for any value of a variable, the value of the other variables should also be subject to the normal distribution c homogeneity of variance: because of population variance variable variance argument same
four equations:. the y = a + bX the y represent predicted values for the variable (not a true value), the intercept of y-axis of a representation, b represents the slope of the regression equation, X is taken from the variable value of
5 . hypothesis testing: the null hypothesis is true (b next = 0), the result of the examination is not possible if (p-value ≤ 0.05), the null hypothesis is rejected, i.e., the regression coefficient is not equal to 0;
if the result of the examination Possible (p value greater than 0.05), the null hypothesis is accepted, i.e. the regression coefficient is zero

 

Exercise: 
This is a supermarket for three consecutive years of sales data, including month, quarter, advertising costs, traffic, sales of five variables,

a total of 36 records, here to predict sales based on advertising costs, advertising costs when 200,000 when sales of approximately how much. Data: supermarket sales data .sav.

 

 

 

 

6 specific steps: 
   A data import. 
   B Data Analysis: Analysis - Regression - linear regression 
   . Explained output C: 
      Descriptive Statistics: Common statistics given 
      correlation: The correlation coefficient of the two variables, the current correlation coefficient is of 0. the 816-tailed = 2 * one-tailed p value <0.05 
                   null hypothesis: H0 of: [rho] = 0 (no correlation) 
                   alternative hypothesis: H1: ρ ≠ 0 (related) 
                   Conclusion: coefficient linear regression is significant
       Input / variable removed: the independent variables used for prediction (predictors)
       模型摘要:R(复相关系数)(pearson相关系数) 0~1、R^2、调整后的R^2----因变量能被自变量预测的程度
                       标准估计的误差-------因变量不能被自变量预测的程度
                       R^2用100%相乘得到的结果表示因变量的总方差中能被自变量所解释的比重
                       eg. 广告费用解释了销售额66.6%的方差
                       eg. 用广告费用来预测销售额时,回归方程的平均预测误差就是46.9953
       ANOVA:自变量是否为因变量的显著预测变量
                      p值<0.05, 拒绝原假设,广告费用是销售额的显著预测变量
       系数:构建回归方程+用于检验假设

 

 
 

 

                 Y=a+bX=377+14.475X
                 预测:Y=377+14.475*20=666.50(分析--回归--线性--保存--未标准化)
                 真实值和预测值之间存在差异(R值越大,预测值与真实值越接近)
                 广告费用:p值<0.05, 拒绝原假设,广告费用是销售额的显著预测变量
                 标准化系数:当自变量和因变量都标准化得到的回归系数(在 一元线性回归当中,beta的值等同于皮尔逊相关系数值)

        Y=108.90-0.358*X

 

案例文件

CCSS_Sample.sav,建立用年龄S3来预测总信心指数值的回归方程。

多元线性回归分析

已知两个或多个不同的变量取值,用这些变量来预测另一个变量的值
因变量(标准变量)、自变量(预测变量)
1. 目的:用两个或多个不同的变量取值得到因变量的预测值
2. 所需的数据:
    因变量:连续变量+自变量:连续变量、二分变量
3. 假设条件:
   a. 观测值独立
   b. 总体中变量服从多元正态分布:总体中每个变量的取值服从正态分布,而且每个变量与其他变量的任意组合也服从正态分布(多元正态分布)
  c. 方差齐性:自变量之间的任意组合所形成的总体中因变量的方差都是相同的
4. 多元回归方程:
    Y=β0+β1 X1+β2 X2+…+βnXn
    Y表示因变量的预测值(不是真实值),β0表示的y轴的截距,βn表示回归方程的第n个系数,Xn表示第n个自变量的取值

 

 

5. 原假设和备择假设:
    n个假设检验--检验回归系数(β)
    原假设:H0: “回归系数β1等于0”,即�β1=0
                 H0: “回归系数β2等于0”,即β2=0
                 H0: “回归系数β3等于0”,即�β3=0
    备择假设:H1:“回归系数β1不等于0”,即β1�≠0
                    H1:“回归系数β2不等于0”,即β�2≠0
                    H0: “回归系数β3不等于0”,即β3≠0
    对回归方程正体进行检验,R^2解释因变量的方差
    原假设:H0: R^2=0
    备择假设:H1: R^2>0
6. 假设检验判断:
    在原假设为真的情况下,如果检验的结果不可能(p值小于等于0.05),则拒绝原假设;
                                        如果检验的结果有可能(p值大于0.05),则接受原假设
7. 具体步骤:

 

 

 

 

 

7. 具体步骤:
   a. 导入数据
   b. 分析数据
   c. 解释结果:--【进入法】
      描述统计:统计量
      相关性:相关性越高,预测效果越好
      输入/除去的变量:R--多重相关系数(表示因变量原始数据和回归预测值之间的相关系数的绝对值)
                                 eg. 输入的所有自变量解释了固定垃圾排放量(因变量)84.9%的方差
      ANOVA: ---检验回归的显著性--检验整个方程
                  p值小于0.05,拒绝原假设,接受备择假设,回归方程可以用显著预测出固定垃圾排放量

 

 
 

 

      系数:构建回归方程+检验自变量系数
                alpha=0.05, p值≤0.05,拒绝原假设,该自变量可以预测因变量
                去掉所有不能够预测的自变量,重新构建回归方程



 

 

 
 
       解释回归系数:
                 如果回归系数是负值,自变量每增加一个单位,因变量减少对应的系数个单位;
                 如果回归系数是正值,自变量每增加一个单位,因变量增加对应的系数个单位;

       【逐步法】
        自动进行模型的筛选,得到最优模型(R^2最大的模型)

        【自动线性建模】
          通过信息准则(AICC)建模

 

 

11111

Guess you like

Origin www.cnblogs.com/foremostxl/p/12232250.html