Chapter 10 Initial Exploration of Linear Regression

10.1 Initial exploration of data 

10.11 Draw a scatter plot (Scatter Plot)

# 导入包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 工作年限与工资的关系(The relationship between working experirnce and salary)
experirnce = [2,1,3,2,3,6,5,7,8,9,10,2,3,5,3,6,4,5,2,3] # 工作年限
salary = [6000,5000,8000,6000,10000,18000,20000,20000,20000,24000,30000,7000,
          8000,15000,9000,21000,9000,18000,8000,12000] # 工资

list_of_tuples = list(zip(experirnce,salary))
df = pd.DataFrame(list_of_tuples,columns=["experirnce","salary"])
describe = df.describe()
corr = df.corr()

plt.scatter(df.experirnce,df.salary)
plt.xlabel("experirnce")
plt.ylabel("salary")
plt.show()

 

 Figures 1, 2, and 3 respectively show the original data, basic description of the data, and covariance.

 

Figure 4 Scatter plot of working years and salary

 Figure 5 Histogram of working years

Figure 6 Heat map

10.12 Covariance

        Covariance measures the linear relationship between two variables. Covariance can only look at the direction of two data changes. For example, if X and Y change in unison, if X rises, Y will rise, and covariance is positive. If X rises, Y will fall. Covariance is negative, but it cannot be seen. The magnitude of the change between X and Y does not indicate how strong the correlation between the two variables is. Because the value of covariance changes from negative infinity to positive infinity, correlation (Correlation) is usually used instead of covariance (Corvariance). The value range of correlation is -1 to 1.

Guess you like

Origin blog.csdn.net/qq_36171491/article/details/124415460