R language machine learning in the field of ecological economics (data collection and cleaning, comprehensive modeling evaluation, data analysis and visualization, data spatial effects, causal inference, etc.)

In recent years, breakthroughs have been made in the field of artificial intelligence, which has had a major impact on various fields of economy and society. Machine learning, which combines statistics, data science and computer science, is one of the mainstream directions of artificial intelligence, and it is also developing rapidly Integrate into econometric research. On the surface, machine learning usually uses big data, while econometrics usually uses smaller samples, but this distinction is becoming increasingly blurred, and machine learning has become increasingly prominent in the field of economics, especially the intersection of economics and other disciplines. R language is the mainstream computer language used for statistical modeling. It is very convenient for machine learning, and the learning curve is smoother than Python, so it is one of the first choices for machine learning.

In this content, we will start from the actual needs of thesis writing, and first briefly introduce the basic theories and research methods of economics, so that you can understand the topic selection method and writing framework of the thesis. Then focus on data collection and cleaning, comprehensive modeling evaluation, data analysis and visualization, data spatial effect, causal inference, etc., so that you can master the technology of using R language for economic research at the fastest speed. At the same time, it will also introduce the auxiliary software often used in thesis writing, so as to reduce the difficulty of thesis writing as much as possible.

Theoretical basis and software introduction

1.1 Basic Principles of Economics

main content:

Economic thinking paradigm, resource allocation, efficiency and fairness (in the field of classical economics).

Gregory Mankiw, the Ten Principles of Economics in a nutshell

For example, David Ricardo's Principle of Comparative Advantage.

For example, opportunities and costs. Positive U-shaped pricing curve, MC (marginal cost) ACT (average total cost)

Rational people assume that market regulation may be the optimal solution.

Dan Ariely's The Anchoring Effect in Freaky Behavior

1.2 The basic idea of ​​probability and statistics

1.2.1 Common Concepts in Probability and Statistics

The birth of probability, milk tea problem.

normal distribution.

confidence interval

P value

1.2.2 Evaluation (single index evaluation and composite index evaluation)

Single index evaluation: such as GDP

Composite index evaluation

Index System Evaluation

1.2.3 Causal inference

Concept generation: Causal inference is the process of describing the causal relationship according to the conditions under which a certain result occurs. The most effective way to infer the causal relationship is to conduct randomized controlled trials, but this method is time-consuming, expensive, and unexplainable and characterize individual differences; therefore, causal inference from observational data is considered. Such frameworks include latent outcome frameworks and structural causal models. The causal inference methods of structural causal models are reviewed below.

Level of evidence, single case, multiple cases, randomized controlled trials, evidence-based, mechanism-by-mechanism analysis

1.3 Machine Learning for Evaluation and Causal Inference (Introduction to Algorithms)

1.3.1 KNN and Kmeans

The KNN (K-Nearest Neighbor) method, the K-nearest neighbor method, was first proposed by Cover and Hart in 1968. It is a relatively mature method in theory and one of the simplest machine learning algorithms. The idea of ​​this method is very simple and intuitive: if most of the K most similar samples in the feature space (that is, the nearest neighbors in the feature space) of a sample belong to a certain category, then the sample also belongs to this category. In the classification decision, this method only determines the category of the sample to be divided according to the category of the nearest one or several samples.

Kmeans

1.3.2 Delphi and AHP

Delphi is the Chinese translation of Delphi. In the 1950s, Rand Corporation of the United States cooperated with Douglas Corporation to develop an effective and reliable method of collecting expert opinions, named after "Delphi". After that, this method was widely used in business, military, education, health care and other fields. The application of Delphi method in medicine began with the research on nursing work, and it has shown its superiority and applicability in the process of use, and has been favored by more and more researchers.

AHP (Analytic Hierarchy Process) is a practical multi-program or multi-objective decision-making method proposed by American operations researcher Professor TL Saaty in the 1970s. It is a decision-making analysis method that combines qualitative and quantitative analysis . . It is often applied to multi-objective, multi-criteria, multi-element, multi-level unstructured complex decision-making problems , especially strategic decision-making problems, and has a very wide range of practicability.

1.3.3 Entropy weight method

TOPSIS-Entropy Weight Method

The entropy weight method is a method to calculate the weight of each index based on the size of data information entropy, which can comprehensively evaluate multi-index targets. The TOPSIS method can further optimize the results of the entropy weight method, making the evaluation results more objective and reasonable [23~25].

The first step is to standardize the data:

1.3.4 Random Forest Algorithm

There is a large category in machine learning called Ensemble Learning. The basic idea of ​​Ensemble Learning is to combine multiple classifiers to achieve an integrated classifier with better prediction effect. The integrated algorithm can be said to have verified an old Chinese saying on the one hand: Three cobblers are better than Zhuge Liang.

1.3.5 Neural network

Neural network learning is divided into two stages: one is the multi-layer feedforward stage, which calculates the actual input and output of each layer node from the input layer at one time; the second is the reverse correction stage, that is, according to the output error, the connection weights are reversely corrected along the way , to reduce the error [27].

1.4 Common software introduction

Excel,R,Stata,Photoshop,Arcgis,SPSS,Geoda,Python,Notexpress,Endnote

Topic 2

Data Acquisition and Collation

2.1 Introduction to data types

quantitative data, categorical data,

Cross-sectional data, time series data, panel data

2.2 Data Acquisition

Papers, Bureau of Statistics, Yearbook, Related Websites, Purchase

https://www.ceads.net.cn/

Statistical Yearbook

Thesis annotation

2.3 Data collation

Common format conversion, filling of missing values

Commonly used evaluation methods and detailed teaching of related software (case details)

3.1 Calculation of agricultural carbon emissions

3.2 Calculation of Carbon Emissions from Energy Consumption

3.3 Comprehensive evaluation method

The input of the formula and the actual operation of the entropy weight method

https://gongshi.wang/

3.4 Data Analysis and Data Visualization

Introduction to common data visualization methods

Box plots, histograms, line charts, geographic graphics, etc.

Three laws of geography and analysis of spatial autocorrelation

3.5 Random Forest Regression Modeling

3.5.1 Model construction and optimization of related parameters

3.5.2 Effect evaluation of the model

3.5.3 Analysis of model results

3.5.4 Driving Factors and Mechanism Mechanism Analysis (Attribution Analysis, Driving Mechanism)

3.6 Neural Network Regression Modeling

The content is the same as above.

Compared with other models

Key points of writing and explanation of cases

4.1 Overall writing points

4.1.1 A good start is half the battle (Introduction)

The source of the topic of the article

4.1.2 Writing method of literature review

4.1.3 Selection of Research Methods and Editing of Formulas

4.1.4 Data Analysis and Visualization (Analysis)

4.1.5 Two Ways of Writing for Discussion (Discussion)

4.1.6 Writing of conclusion and abstract

4.1.7 Mentality construction, journal selection and submission

4.2 Case explanation

4.2.1 Introduction to two common types of papers

Introduction to Experimental Types of Articles

Introduction to Model Computing Articles

4.2.2 Case

Spatial-temporal characteristics and trend prediction of agricultural carbon emissions in Shanxi Province from 2000 to 2020

Assessment of Xinjiang's Agricultural Carbon Emissions and Analysis of Driving Factors Based on Machine Learning Algorithm

Driving factors and decoupling effects of carbon emissions in Northwest China

Regional differences and distribution dynamic evolution of high-quality agricultural development in China

Guess you like

Origin blog.csdn.net/weixin_46433038/article/details/132449795