Machine Learning Knowledge and Experience Sharing Six: Decision Tree

     The python language is widely used in deep learning, and the R language is used in the field of machine learning for data prediction and data processing algorithms. In the future, more knowledge sharing about machine learning data prediction will be shared. Friends in need can continue to pay attention. If you have any questions, you can leave a private message after paying attention.

Table of contents

1. Introduction to R language

2. R language installation (Windows as an example)

1. Introduction to R language

        R language is a free and open source programming language and statistical software environment, which has very strong capabilities in statistical computing and graphics rendering. The R language was originally developed by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand, and it is now jointly developed and maintained by statisticians and programmers around the world. R language supports a variety of statistical methods, such as linear and nonlinear modeling, classical statistics and econometrics, time series analysis, classification and clustering, etc. The R language also has a very powerful graphics system that can generate a variety of high-quality statistical graphics. The advantages of the R language are not only reflected in its free and open source features, but also in its powerful data processing and visualization functions, interoperability with other programming languages ​​and data formats, freely developed extension packages, community support, and portability. wait. A large number of extension packages of the R language is a major feature of it. These extension packages provide a variety of extension functions and tools, such as machine learning, deep learning, natural language processing and network analysis. In conclusion, the R language is not only a tool for statisticians and data scientists, but also a high-level programming language widely used in the wider field of science, engineering and business.

2. R language installation (Windows as an example)

      Same as python language installation, 1. You can first download the latest R for Windows installer on the R official website (https://www.r-project.org/). 2. Run the downloaded R for Windows installer and follow the prompts to install. By default, the R language will be installed into the C:\Program Files\R folder. 3. The installer may prompt to select installation items, such as installing 32-bit or 64-bit version, adding a graphical user interface, etc. Choose according to your needs. 4. Wait for the installer to complete, the installer will create a shortcut to R that can be launched from the start menu or from the icon on the desktop. Then, install the R language IDE - RStudio. RStudio IDE - RStudio

The following is an example of the R language code: 

# 安装库
install.packages("dplyr")
# 加载所需的库
library(dplyr)

# 读取csv文件
df <- read.csv("data.csv")

# 数据清洗
df <- df %>% filter(!is.na(attr_1)) %>% select(-c(attr_2, attr_3))

# 数据分组和统计
result <- df %>% group_by(attr_1) %>%
                summarise(count = n(), 
                          mean_val = mean(attr_4), 
                          max_val = max(attr_5))

 The function of this statement includes the following steps:
1. Load the dplyr library, which is convenient for data cleaning, grouping and statistical operations.
2. Read data from csv file and store into df data frame.
3. Perform data cleaning operations on the df data frame, delete rows containing NA values, and delete unnecessary columns.
4. Perform grouping operations on the cleaned df data frame, and count the number of data in each group, the average value of attr_4 and the maximum value of attr_5.
5. The final result is stored in the result data frame.

3. R ​​language book sharing

Friends in need follow and get private messages.

Baidu network disk link: https://pan.baidu.com/s/1hFIjbbk6h8uQVmATX5O_AQ 
Extraction code: private message after following

 

 4. Common errors reported when running R language

Common error: Error in loadNamespace(x) : There is no program package named 'ggbeeswarm'

 Cause of error: R language has a large number of libraries like python language, which is the reason for the lack of libraries.

Solution: install.packages("ggbeeswarm")

In the follow-up, we will continue to share the explanation of machine learning algorithms implemented in R language, such as decision tree, random forest, and regression network. If you need friends, please continue to pay attention. If you have any questions, you can follow and leave a message. 

Guess you like

Origin blog.csdn.net/m0_70388905/article/details/130978943
Recommended