Python basic library and machine learning notes

1 Introduction

This section will mainly introduce commonly used third-party libraries in Python. These libraries are open source libraries that implement various computing functions, and they greatly enrich the application scenarios and computing capabilities of Python. Here we mainly introduce the basic use of the three libraries of NumPy, pandas and Matplotlib. Among them, NumPy is a mathematical calculation library used by Python for matrix operations and high-dimensional array operations; pandas is a library used by Python for data preprocessing, data manipulation, and data analysis; Matplotlib is an easy-to-use data visualization library that contains a wealth of data visualization capabilities. Next, we will introduce the simple applications of these libraries one by one.

1.1 NumPy

NumPy Chinese official website: https://www.numpy.org.cn/
NumPy is the basic package for scientific computing using Python. It contains the following content:

A robust N-dimensional array object.
Complex (broadcast) functions.
Tools for integrating C/C++ and Fortran code.
Useful linear algebra, Fourier transform and random number functions.
Besides its obvious scientific uses, NumPy can also be used as an efficient multidimensional container for generic data. Any data type can be defined. This enables NumPy to seamlessly and quickly integrate with various databases.

NumPy is licensed under the BSD license, allowing reuse without restriction.

1.2 pandas

pandas Chinese official website: https://pypandas.cn/
The pandas library is a very important and commonly used library in data analysis. It uses data frames to make data processing and manipulation
easy and fast. , time series, visualization, etc. have applications. Next, we briefly
introduce how to use pandas, including how to generate sequences and data tables, data aggregation and grouping operations, and data visualization
functions. The pandas library often uses pd instead after importing.

1.3 Matplotlib

Matplotlib Chinese official website: https://matplotlib.org.cn/
Matplotlib is a Python drawing library with rich drawing functions. pyplot is one of the modules, which
provides a drawing interface similar to MATLAB, capable of drawing 2D, 3D, etc. Rich images are a good helper for data visualization.
Next, we will briefly introduce how to use them.

1.4 sklearn (scikit-learn)

sklearn (scikit-learn) is a Python-based machine learning tool.
Simple and efficient data mining and data analysis tools, which can be reused in various environments.
Built on NumPy, SciPy and matplotlib, open source and commercially available - BSD license

GitHub Pages (foreign): https://sklearn.apachecn.org
Gitee Pages (domestic): https://apachecn.gitee.io/sklearn-doc-zh
Third-party webmaster [website], sklearn Chinese documentation: http: //www.scikitlearn.com.cn

2. Preliminary study of machine learning models

For the data set to be analyzed, the steps of modeling and analysis using machine learning algorithms are actually very fixed. Let's first
look at an actual machine learning application case.
Assuming that the price of a house is only related to the area, Table 1-1 shows the data between the area and price of some houses. Please calculate the
price of a house with a size of 40 square meters.
insert image description here

import numpy as np
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

x=np.array([56,32,78,160,240,89,91,69,43])
y=np.array( [90,65,125,272,312,147,159,109,78])
#数据导入与处理,并进行数据探索
X=x.reshape(-1,1)
Y=y.reshape(-1,1)
plt.figure(figsize=(4,3))#初始化图像窗口
plt.scatter(X,Y,s=50)#原始数据的图
# plt.title("原始数据的图")
plt.title("Raw Data")
plt.show()
#训练模型和预测
model=LinearRegression()
model.fit(X,Y)
x1=np.array([40,]).reshape(-1,1)    #带预测数据
xl_pre=model.predict(np.array(x1))  #预测面积为40m2时的房价

#数据可视化,将预测的点也打印在图上
plt.figure(figsize=(4,3))
plt.scatter(X,Y)#原始数据的图
b=model.intercept_ #截距
a=model.coef_ #斜率

y=a*X+b#原始数据按照训练好的模型画出直线

plt.plot(X,y)
y1=a*x1+b
plt.scatter(x1,y1,color='r')
plt.show()

After running the program, it can be obtained that when the house area is 40m2, the predicted value of the model is 79.59645966, that is, the price is about 795,900 yuan.
insert image description here
The above is the implementation method of unary linear regression. But in reality, housing prices are affected by too many factors, not only related to the area, but also related to the geographical location, and also related to the plot ratio of the community, which requires the use of multiple linear regression and machine learning for fitting.
In machine learning, in addition to the unary linear regression and multiple linear regression models, the commonly used learning methods also include models such as logistic regression, clustering, decision trees, random vectors, support vector machines, and naive Bayesian models. The steps for using these models are basically Similarly, the steps are as follows:
①data preprocessing and exploration;
②data feature engineering:
③build model;
④train model;
⑤model prediction;
⑥evaluate model.
For example, the one-element linear regression model for housing price prediction above has gone through 5 steps.
(1) Data preprocessing and exploration: that is, organize the data and process the data into a data format suitable for the model.
(2) Building a model: use model=LinearRegression() to build a linear regression model.
(3) Training model: model.fit (x, y) .
(4) Model prediction: model.predict ( [ [a] ] ).
(5) Evaluate the model: use the visualization method to intuitively evaluate the prediction effect of the model.
In the actual application process of machine learning models, data preprocessing and exploration, and data feature engineering are the two parts with the largest workload. Therefore, in the process of using machine learning models, the data will be fully understood and organized into appropriate The data format and extracting useful features from the data often consume a lot of time, and the last is to effectively evaluate the established model.

Guess you like

Origin blog.csdn.net/wokaowokaowokao12345/article/details/128418869