Two, Matplotlib data visualization-scatter plot

1. Concept

The scatter chart displays the values ​​of two sets of data, and the coordinate position of each point is determined by the value of the variable. ·Completed by a set of disconnected points, used to observe the correlation of two variables. For example, height-weight, temperature-latitude, etc.

2. Draw a simple scatter plot

import numpy as np
import matplotlib.pyplot as plt

height=[161,170,182,175,173,165]
weight=[50,58,80,70,69,55]

plt.scatter(height,weight)
plt.show()

Three, the role of scatter chart

The scatter plot is mainly used to observe the correlation of data. There are three types of data correlation, namely positive correlation, negative correlation, and irrelevance.

(1) Not relevant

import numpy as np
import matplotlib.pyplot as plt

N=1000
x=np.random.randn(N)
y=np.random.randn(N)

plt.scatter(x,y)
plt.show()

(2) Positive correlation

import numpy as np
import matplotlib.pyplot as plt

N=1000
x=np.random.randn(N)
y=np.random.randn(N)*0.5+x

plt.scatter(x,y)
plt.show()

(Three) negative correlation

import numpy as np
import matplotlib.pyplot as plt

N=1000
x=np.random.randn(N)
y=np.random.randn(N)*0.5-x

plt.scatter(x,y)
plt.show()

Fourth, some configurations of scatter function

plt.scatter(x,y,s=100,c='r',marker='o',alpha=0.5), s is the area size of the point, c is the color of the point, the default is b blue, and the marker is point The default shape is o circle, alpha is transparency, and the default is 1 completely opaque. The specific parameter content can be viewed on the official website https://matplotlib.org/

import numpy as np
import matplotlib.pyplot as plt

N=100
x=np.random.randn(N)
y=np.random.randn(N)*0.5-x

plt.scatter(x,y,s=100,c='r',marker='>',alpha=0.5)
plt.show()

Five, homework

1. Use 000001.SH data

2. Calculate the difference between the highest price and the order price diff

3. Draw a scatter plot of diff for the two days before and after, and study whether it is relevant

Adjust the color of the scatter chart: c, point size: s, transparency: alpha, point shape: marker

Since there is no 000001.SH data, I randomly generate a 000001.SH.csv file to complete the job

import numpy as np
import matplotlib.pyplot as plt
#随机生成000001.SH.csv文件,保存到当前目录
a=[]
for i in range(100):
    a.append(np.random.randint(100,500,2))
np.savetxt('000001.SH.csv',np.array(a),fmt='%d',delimiter=',')
#读取000001.SH.csv文件数据
a,b=np.loadtxt('000001.SH.csv',dtype=int,delimiter=',',unpack=True)
diff=b-a
print(diff)

today=diff[1:]
yesterday=diff[:-1]
#绘图
plt.scatter(today,yesterday,s=50,c='r',marker='<',alpha=0.3)
plt.show()

 

 

Guess you like

Origin blog.csdn.net/qq_40836442/article/details/112329439