R data analysis: understanding and practice of polynomial regression and response surface analysis

Today I will share with you a new statistical method called response surface analysis, which is used to explore the Congruence hypotheses of variables. It is an engineering method in itself, and is currently being used more and more in the fields of organizational behavior, management, marketing, etc.

Congruence hypotheses state that the agreement (i.e., congruence) between two constructs should positively (or negatively) affect some outcome variable. Such hypotheses play a central role in many disciplines, for example, Marketing (Kim & Hsieh, 2003), Organizational Behavior (Caniëls & Veld, 2019), and Purchasing (Caniëls, Vos, Schiele, & Pulles, 2018).

Response surface analysis is the best choice especially when exploring the effects of consistency and inconsistency, that is, say you have two independent variables and a dependent variable, and you want to see how the dependent variable changes if both independent variables change both consistently (increase or decrease at the same time) and inconsistently (one increases and the other decreases). At this time, you must remember to use response surface analysis.

response surface analysis (RSA) is an approach that allows examining the extent to which combinations of two predictive variables relate to one outcome variable. The method is particularly interesting in cases where (in)congruence between the two predictive variables is a central consideration of the study.

The test of the consistency hypothesis needs to involve polynomial regression, and the advantage of response surface analysis is that it can draw the results of polynomial regression in 3D, so that we can clearly see the changes of the dependent variable under various combinations of independent variables. And the corresponding assumptions are proved by the polynomial coefficients.

The foundation of RSA is the visualization of the results of the regression equation on a three-dimensional graph . Instead of directly interpreting the results of the polynomial regression analysis, the coefficients are used to examine what is called a ‘response surface pattern’ . The response surface is a graph that provides a three-dimensional visual representation of the data to aid interpretation

And the response surface analysis can also test the interaction, and is not limited by the linear assumption. Today, I will share with you the understanding and specific practices of the response surface analysis based on two interesting literatures.

theoretical understanding

Think about it if you don't know response surface analysis, and you want to study the effect of the inconsistency of two variables on the outcome, what would you do? For example, you want to study the impact of parental expectations x1 and children's interests x2 on children's achievement y, and want to verify whether the stronger the consistency between parental expectations and children's interests, the higher the children's future achievements? how do you do it

Calculate a new variable? Take an absolute value of x1-x2 as a new independent variable, called the gap between the two? Use this new variable to do the regression analysis of y?

It is estimated that the first thing most students think of is such an operation.

There are two problems here: 1. The information is lost; 2. You cannot know whether the same effect is caused by x1 being larger than x2, or whether x2 is larger than x1.

Initially, these approaches compute two predictor variables into a single score, which reduces the available information. For this reason, the difference scores confuse the effects of each of the component measures on the result. The difference scores do not tell us the extent to which each of the component measures contributes to the outcome variable

So this idea is not good, it cannot be said that it is wrong, it can only be said that it is not good.

The most correct approach at this point is to use polynomial regression:

In the above formula, xy is two independent variables, Z is the dependent variable, and there is also a secondary term of xy in the formula. For such a formula, we can express it graphically

In the figure, the two independent variables are on the xy axis, and the dependent variable or the response value of the model is on the z axis, so that the model response values ​​corresponding to all xy combination values ​​become a curved surface, called the response surface .

For example, the corresponding Z value of the circle corresponding to a specific xy on the bottom surface is the star on the response surface.

When looking at the picture, there are two lines that deserve our special attention: the line of consistency (LOC) and the line of incongruence (LOIC) in the picture

  • line of consistency

The consistency line is a line composed of all xy equal points, which is a 45° line on the xy plane. The response surface corresponding to this line represents the change of z value when the consistency changes. In the figure is the response surface corresponding to the red line in the above figure. It can be seen that the Z value is always the largest when xy is consistent.

  • line of inconsistency

The inconsistency line is a straight line composed of all xy points that are opposite to each other. It is a line perpendicular to the consistency line on the xy plane. In the above figure, it is the blue line on the xy plane. It can be seen that the greater the difference between xy and the lower the value of Z.

Through such a visual expression, we can easily know the specific changes of Z under all changes of xy.

And combined with the coefficients of the polynomial model, we can also test the corresponding assumptions

Let's go back to the response surface corresponding to the consistency line, on this response surface x=y, the expression of Z becomes a quadratic function:

Z = a1X + a2X2, where a1 = b1 + b2 and a2 = b3 + b4 + b5

The coefficient a2 determines whether the corresponding response surface is a straight line or a curve, and a1 determines the slope of the corresponding response surface.

Look at the response surface corresponding to the inconsistency line. On this response surface, x=-y, the expression of Z is also a quadratic function:

Z = a3X + a4X2, where a3 = b1-b2 and a4 = b3-b4 + b5

The coefficient a4 determines whether the corresponding response surface is a straight line or a curve, and a3 determines the slope of the corresponding response surface

According to the different combinations of coefficients, there are different shapes of the response surface in the figure: look at the figure below, for example, a1>0 (the Z value corresponding to the consistency line is a straight line with an upward slope), and a4<0 (the response surface corresponding to the inconsistency line is a curve with an opening downward).

Through the coefficients introduced above, we can verify the corresponding assumptions, and then let's look at two actual examples.

The title of a document is as follows:

Bai, Q., Lei, L., Hsueh, F. H., Yu, X., Hu, H., Wang, X., & Wang, P. (2020). Parent-adolescent congruence in phubbing and adolescents’ depressive symptoms: A moderated polynomial regression with response surface analyses. Journal of Affective Disorders, 275, 127-135.

The article made the impact of the behavior habit of bowing their heads on the occurrence of depression. At the same time, parents bowed their heads and children bowed their heads. After forming a polynomial regression model, variables were selected for response surface analysis. The main results are as follows:

The author puts the two independent variables that the author cares about, one is parents bowing their heads, and the other is children bowing their heads, and put them on the x-axis; put the dependent variable, children's depression, on the z-axis. Through such a graphical representation, it can be seen that the changes in depression when x and y increase consistently and when xy changes inconsistently, so as to answer the research question .

On the presentation of the results, the author reports the coefficient and p-value of the line of agreement, thus answering Hypothesis 4:

That is, the slope of the line of consistency is a significant positive value, which means that both parents and children bow their heads (consistently bowing their heads), and the risk of depression in children will increase. At the same time, the coefficient of the inconsistency line is also understood in the same way.

The author also tested the regulatory effect through such an analysis. The original description of the method part is shown in the figure below. The method used is called hierarchical regression analysis:

It is to nest several regressions, and then compare the R square of the model to judge whether the interaction item should exist from a data-driven perspective. The principle is: after adding the interaction item, the R square becomes significantly larger, indicating that the addition of the interaction item can significantly increase the explanatory power of the model.

Read another article in the field of management:

Lee, K., Woo, H. G., & Joshi, K. (2017). Pro-innovation culture, ambidexterity and new product development performance: Polynomial regression and response surface analysis. European Management Journal, 35(2), 249-260.

The article uses response surface analysis to verify the following two assumptions:

NPD performance will increase as both exploitation and exploration increase simultaneously.

NPD performance will decrease as the imbalance between exploitation and exploration increases in either direction.

Still exploring the consistency and inconsistency of the trends of the two independent variables, the author gave graphs and tables to answer the research hypothesis:

Hypothesis 5 in the original text means that when two independent variables increase at the same time, the dependent variable will also increase accordingly. In order to verify this assumption, the article makes the two independent variables in the original polynomial regression equal, and the regression coefficient is simplified after simplifying the equation.

That is, b1+b2 in the above table must be positive, and then b3+b4+b5 must be insignificant, because the second condition is not satisfied, so the author got the conclusion that hypothesis 5 is not valid.

Similarly, hypothesis 6 believes that when the two independent variables are inconsistent, the dependent variable will become smaller. In order to verify this hypothesis, the article makes the two independent variables opposite numbers. At this time, it is necessary to meet the conditions of hypothesis 6. The coefficient of the first term of the simplified equation should be a significant negative value, and the quadratic term should be 0 or a significant negative value.

That is, b1-b2 in the above table should be significantly negative, and then b3-b4+b5 should be 0 or negative, because the two conditions are not met, so the author concludes that hypothesis 6 is not true.

Through such a figure and a table, the article completes the answer to the research question. The above is a brief introduction of the two sample texts of response surface analysis. Please read the original text for detailed writing. Let's see how to do it.

Practice steps

Spss can do response surface analysis, but we still only write the R method. There are two steps to do response surface analysis:

conceptually RSA is divided into two stages: (a) running a polynomial regression model and (b) using the results of the model to generate a response surface and analyze the importance of the effects

You can use the rsm package to do response surface analysis in R. The first step is to fit a polynomial regression with a 2-degree term. For example, I now have the following data, x, y, and z variables

First I need to run a quadratic polynomial regression:

rsm(z ~ SO(x, y), data = data)

After running, directly summarize the object generated by the function above to get the result of the quadratic polynomial

It can be seen that the coefficients of each item of xy are displayed, and we combine these coefficients to verify our research hypothesis.

The second step is to visualize the model results through the response surface. The code is as follows:

persp (rsm, ~x+y, 
       col = color,main="实例操练",
       xlab=c("关注公众号","Codewar"),zlab = "示例",
       r=50,d=30,expand=1,box = T,
       #ltheta=10,lphi=99,
       shade=0.1,theta=-15,phi=15,
       #axes=F,
       contour=list(z="bottom"),
       cex.lab=1,
       cex.axis=0.5,
       ticktype="detailed",
       at = xs(rsm))

In the above code, rsm is a model object. After running, the response surface diagram is as follows:

Through the above figure, you can intuitively see the corresponding change of z value when xy changes differently.

This concludes the response surface analysis.

Guess you like

Origin blog.csdn.net/tm_ggplot2/article/details/130994076