Attach the code directly, brothers can modify these two lines of path
test = pd.read_csv('D:\wangyong\Wang\kaggle\year/test.csv')
train = pd.read_csv('D:\wangyong\Wang\kaggle\year/train.csv')
Here is the full source code
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
test = pd.read_csv('D:\wangyong\Wang\kaggle\year/test.csv')
train = pd.read_csv('D:\wangyong\Wang\kaggle\year/train.csv')
print(train.head())
print(train.describe())
print(test.describe())
key_train = train.keys()
key_test = test.keys()
print(key_train)
print(key_test)
for i in range(len(key_test)-1):
train_data = []
test_data = []
for x in train[key_train[i+1]]:
train_data.append(x)
for x in test[key_test[i+1]]:
test_data.append(x)
plt.figure(figsize=(8,4),dpi = 150)
sns.kdeplot(train_data,color = "Red",shade = True)
ax = sns.kdeplot(test_data,color = "Blue",shade = True)
ax.set_xlabel(key_train[i])
ax.set_ylabel("values")
ax.legend(["train","test"])
plt.show()
This is a general effect. It can be found that there is still a certain gap between the distribution of the training set and the test set.
This is to observe the distribution of training set and test set in a visual way