1. Concept
The scatter chart displays the values of two sets of data, and the coordinate position of each point is determined by the value of the variable. ·Completed by a set of disconnected points, used to observe the correlation of two variables. For example, height-weight, temperature-latitude, etc.
2. Draw a simple scatter plot
import numpy as np
import matplotlib.pyplot as plt
height=[161,170,182,175,173,165]
weight=[50,58,80,70,69,55]
plt.scatter(height,weight)
plt.show()
Three, the role of scatter chart
The scatter plot is mainly used to observe the correlation of data. There are three types of data correlation, namely positive correlation, negative correlation, and irrelevance.
(1) Not relevant
import numpy as np
import matplotlib.pyplot as plt
N=1000
x=np.random.randn(N)
y=np.random.randn(N)
plt.scatter(x,y)
plt.show()
(2) Positive correlation
import numpy as np
import matplotlib.pyplot as plt
N=1000
x=np.random.randn(N)
y=np.random.randn(N)*0.5+x
plt.scatter(x,y)
plt.show()
(Three) negative correlation
import numpy as np
import matplotlib.pyplot as plt
N=1000
x=np.random.randn(N)
y=np.random.randn(N)*0.5-x
plt.scatter(x,y)
plt.show()
Fourth, some configurations of scatter function
plt.scatter(x,y,s=100,c='r',marker='o',alpha=0.5), s is the area size of the point, c is the color of the point, the default is b blue, and the marker is point The default shape is o circle, alpha is transparency, and the default is 1 completely opaque. The specific parameter content can be viewed on the official website https://matplotlib.org/
import numpy as np
import matplotlib.pyplot as plt
N=100
x=np.random.randn(N)
y=np.random.randn(N)*0.5-x
plt.scatter(x,y,s=100,c='r',marker='>',alpha=0.5)
plt.show()
Five, homework
1. Use 000001.SH data
2. Calculate the difference between the highest price and the order price diff
3. Draw a scatter plot of diff for the two days before and after, and study whether it is relevant
Adjust the color of the scatter chart: c, point size: s, transparency: alpha, point shape: marker
Since there is no 000001.SH data, I randomly generate a 000001.SH.csv file to complete the job
import numpy as np
import matplotlib.pyplot as plt
#随机生成000001.SH.csv文件,保存到当前目录
a=[]
for i in range(100):
a.append(np.random.randint(100,500,2))
np.savetxt('000001.SH.csv',np.array(a),fmt='%d',delimiter=',')
#读取000001.SH.csv文件数据
a,b=np.loadtxt('000001.SH.csv',dtype=int,delimiter=',',unpack=True)
diff=b-a
print(diff)
today=diff[1:]
yesterday=diff[:-1]
#绘图
plt.scatter(today,yesterday,s=50,c='r',marker='<',alpha=0.3)
plt.show()