When should regression analysis be used? What do control variables mean?

When should regression analysis be used?  What do control variables mean?

Anyone who engages in econometrics pays attention to this account

Manuscript: [email protected]

All the code programs, macro and micro databases and various software of the econometric circle methodology are placed in the community. Welcome to the econometric circle community for exchanges and visits.
When should regression analysis be used?  What do control variables mean?

For a compilation of some measurement methods, scholars can refer to the following articles: ① "200 articles used in empirical research, a toolkit for social science scholars", ② 50 famous experience posts commonly used in empirical article writing, a must-read series for students ③The Articles album on Chinese topics on the AER in the past 10 years. ④AEA announced the top ten research topics that received the most attention in 2017-19, giving you the direction of the topic selection. ⑤The key topic selection direction of the top Chinese journals in 2020, just write the paper These ones. Later, we introduced a collection of selected articles using CFPS, CHFS, CHNS data for empirical research! , ②These 40 micro-databases are enough for your Ph.D., anyway, relying on these libraries to become a professor, ③The most complete collection of shortcut keys in the history of Python, Stata, and R software! , ④ 100 selected Articles albums about (fuzzy) breakpoint regression design! , ⑤ 32 selected Articles of DID about the double difference method! , ⑥ 33 selected Articles of SCM about the synthesis control method! ⑦Compilation of the latest 80 papers about China's international trade field! ⑧Compilation of 70 recent economic papers on China's environmental ecology! ⑨A collection of selected articles using CEPS, CHARLS, CGSS, CLHLS database empirical research! ⑩Compilation of the last 50 papers using the system GMM to conduct empirical research!
text

About text below the content, author: Chen Ming Xin, Dalian University of Technology School of Business, communication mail: [email protected]

The author’s previous article: What is random allocation, why is it important, and how does it affect causality?
Use regression analysis to describe the relationship between a series of independent variables and dependent variables. Regression analysis generates a regression equation whose coefficients represent the relationship between each independent variable and the dependent variable. You can also use this equation to make predictions.
As a statistician, I will tell you that like parents love their children, I love all statistical analysis equally. But shhh! I also have secrets. Regression analysis is my favorite because it provides great flexibility, making it suitable for many different situations. In fact, I describe regression analysis as the next level of correlation analysis.
In this article, I explained the capabilities of regression analysis, the types of variable relationships it can evaluate, how it controls variables, and why I like it! You will learn when you should consider using regression analysis.
Use regression to analyze various relationships.
When should regression analysis be used?  What do control variables mean?
Regression analysis can solve many things. For example, you can use regression analysis to do the following:
• Model multiple independent variables
• Including continuous and categorical variables
• Use polynomials to describe curvature ( model curvature)
• Use interaction terms to evaluate whether the effect of an independent variable depends on the value of another variable.
These abilities are cool, but they don’t yet include one of the most magical abilities. Regression analysis can interpret very complex problems, such as when variables are entangled like spaghetti. Imagine that you, as a researcher, need to study the following questions:
• Do socioeconomic status and race affect educational achievement?
• Will education and IQ affect income?
• Do exercise habits and balanced diet affect weight?
• Are coffee and smoking related to the risk of death?
• Does a particular exercise intervention have an effect on bone density and does this effect differ from other sports?
In all these research questions, the independent variables are intertwined (correlated) and affect the dependent variables together. How do you unravel this web of related variables? Which variable is statistically significant, and what role does each variable play? Let regression come to your rescue, because you can use it to analyze all these situations.
Use regression analysis to control independent variables.
Like I said, regression analysis describes how changes in independent variables are related to changes in dependent variables. The point is that regression statistically controls every variable in your model.
What do control variables mean?
When you perform regression analysis, you need to isolate the role each variable plays in the model. For example, I participated in a sports intervention study to determine whether the intervention increased the bone mineral density of the subjects. We need to distinguish the role of exercise intervention from all the factors from diet to other physical activity that may affect bone mineral density.
In order to accomplish this goal, you must minimize the effect of confounding variables. Regression analysis achieves this goal by estimating the effect of a change in one independent variable on the dependent variable while keeping other independent variables unchanged. This process allows you to understand the "role" played by each independent variable without worrying about the influence of other variables in the model.
How do you control the other variables in the
regression The beautiful aspect of regression analysis is that you can keep them constant just by including the independent variables in the model! Let's take a look at an example.
A recent study analyzed the impact of coffee intake on mortality. Initial results indicate that higher coffee intake is associated with higher mortality. However, coffee drinkers often smoke cigarettes, and the researchers did not include smoking in their initial model. After they included smoking in the model, the regression results showed that coffee intake reduced the risk of death, while smoking increased the risk of death. This isolates the "role" of each variable while leaving another variable unchanged. You can evaluate the effect of coffee intake while controlling smoking, and you can also conveniently control coffee intake while viewing the effect of smoking.
Note that this study also illustrates how excluding a related variable can cause misleading results. Ignoring an important variable will cause it to go out of control and may bias the results of the variables included in the model. This question is particularly applicable to observational studies, where the impact of omitted variables may be unbalanced. On the other hand, in real experiments, the randomization process tends to evenly distribute the influence of these variables, thereby reducing the deviation of the omitted variables (this is called the random assignment of treatment).
How to interpret the regression results
When answering questions with regression analysis, you must first adjust and test whether your model is correct. Then, check the regression coefficient and P value. When the P value is very low (usually <0.05), the independent variable is statistically significant. The coefficient represents the average degree of change in the dependent variable caused by the change of a given independent variable under the control of other independent variables.
Assuming that the dependent variable is income and the independent variables include IQ and education (and other related variables), you will see this output:
When should regression analysis be used?  What do control variables mean?
the P values ​​in this table are all less than 0.05, which indicates that education and IQ are both statistically significant . The coefficient of IQ indicates that for every point increase in IQ, income will increase by about 4.80 on average. In addition, while keeping other variables constant, the average income of additional educational units increased by 24.22.
Regression analysis is a form of inference statistics. The P value helps determine whether the relationships you observe in the sample also exist in the larger population.
Obtaining a reliable regression result
also brings a lot of responsibility when using the great power generated by the regression. Sorry, but this is the truth. In order to obtain reliable regression results, you must do the following:
• Confirm the correct regression model. As we have seen, if you do not include all important variables into the model, the results will be biased.
• Check your residual plot. Make sure your model fits the data properly.
• The correlation between independent variables is called multicollinearity. As we have seen, some multicollinearity is possible, however, too much multicollinearity may cause problems.
Using regression analysis gives you the ability to distinguish the effects of complex research questions. You can solve this complicated relationship like spaghetti by modeling and controlling all relevant variables, and then assess the "role" played by each independent variable.
Reference: https://statisticsbyjim.com/

Extended reading:

Regarding some commonly used databases, scholars can refer to the following articles: 1. These 40 micro-databases are enough for your Ph.D. graduation; 2. The complete procedure and corresponding data of 160 steps in the database of Chinese industrial enterprises; 3. Chinese provinces/prefecture-level cities Night light data; 4.1997-2014 authoritative version of China's marketization index; 5.1998-2016 annual average PM2.5 of China's prefecture-level cities; 6. Collection of economic and social databases in the econometric circle (in the community); 7. Chinese dialects, Officials, administrative approvals and the opening of the provincial governor database; 8. 2005-2015 China's CO2 data by provinces and industries; 9. Data evolution and contemporary issues in international trade research; 10. Manual of Chinese microdata commonly used in economic research; 11. Wind during the epidemic period Information and financial terminal operation guide; 12. CEIC database operation guide; 13. What are the Tsinghua Peking University Economic Management and Social Sciences database? Don’t be jealous! 14. The three major Chinese databases in the financial field, CSMAR, CCER, Wind and CNRDS, the latest version of 15.EPS user manual, 16. The measurement course during the epidemic period is free and open! Panel data, causal inference, time series analysis and Stata application.

The following short-linked articles belong to a collection, you can collect them and read them, or you won't find them in the future.
In 2.5 years, nearly 1,000 non-weighted measurement articles in the econometric circle,

You can search for any measurement related issues directly in the official account menu bar,

Econometrics Circle

Guess you like

Origin blog.51cto.com/15057855/2676741