Table of contents
1. Introduction to the histogram
(2) Advantages and disadvantages
3. Python database connection configuration and data extraction settings
(1) Call library and connection syntax
(2) Explanation of syntax parameters
4. Global variable configuration
(1) Call the drawing function and make graphics
1. Introduction to the histogram
(1 Introduction
Histogram, also known as bar graph (English: bargraph), long bar graph (English: barchart), bar graph (Bar graph), is a statistical report graph that expresses graphics with the length of a rectangle as a variable. The distribution of data is represented by a series of vertical stripes of varying heights, used to compare two or more values (at different times or under different conditions), with only one variable, usually used for analysis of smaller data sets. Histograms can also be arranged horizontally or expressed in a multi-dimensional manner.
(2) Advantages and disadvantages
advantage:
① It is convenient for users to understand a large amount of data and the relationship between data.
②The advantage is that users can read the original data more quickly and intuitively through visual symbols.
shortcoming:
The limitation of the histogram is that it is only suitable for small and medium-sized data sets.
(3) Scope of application
The applicable occasion is a two-dimensional data set, which is used to compare data changes over a period of time
2. Data introduction
(1) Data composition
The histogram drawing data is provided by the order table (order) in the database, where the table order contains 21 columns such as order number (ORDER_ID), order date (ORDER_DATE), store name (SITE).
(2) Data selection
According to the definition and scope of application of the column chart, the data we choose for drawing this time is data that has statistical counts and can be compared, so we choose the sales manager and order profit this time.
In Navicat, the SQL statements are used to calculate the sales profits of each sales manager in 2019.
SELECT MANAGER, SUM(PROFIT) as TotalProfit FROM orders where FY='2019' group by MANAGER
3. Python database connection configuration and data extraction settings
(1) Call library and connection syntax
There is no pymysql library, you can install it by pip install pymysql
import pymysql
import pandas as pd # 用来做数据导入(pd.read_sql_query() 执行sql语句得到结果df)
import matplotlib.pyplot as plt # 用来画图(plt.plot()折线图, plt.bar()柱状图,....)
# 1. 连接MySQL数据库: 创建数据库连接
conn = pymysql.connect(host='ip',port=端口号,user='用户名',password='用户密码',db='连接表名')
(2) Explanation of syntax parameters
After calling the library, create a connection through pymysql.connect. The connection parameters are as follows:
host: host name, ip address that can also be stored
port: database port number, general database port number 3306
user: username
password: user password
db: database name
(3) Data extraction settings
Connecting to the database and extracting data from the database involves the SQL query of the database. There will also be a simple database operation method under Python.
# 2 创建一个sql语句
# -- 统计每个销售经理2019年的利润总额
sql = r"SELECT MANAGER, SUM(PROFIT) as TotalProfit FROM orders where FY='2019' group by MANAGER"
# 3 执行sql语句获取统计查询结果
df = pd.read_sql_query(sql, conn)
4. Global variable configuration
(1) Font canvas configuration
The font canvas settings here can be placed after the library is imported when using matplotlib to draw a picture, and can be regarded as a fixed setting. The parameter introduction has been introduced in the previous plot() function drawing. Please refer to the previous article for details.
plt.rcParams['font.sans-serif'] = 'SimHei' # 设置中文字体支持中文显示
plt.rcParams['axes.unicode_minus'] = False # 支持中文字体下显示'-'号
# figure 分辨率 800x600
plt.rcParams['figure.figsize'] = (6,4) # 8x6 inches
plt.rcParams['figure.dpi'] = 100 # 100 dot per inch
(2) Title and label settings
title() is the title setting, ylael() sets the label of the y-axis, grid() gridline setting
#标签、标题设置
plt.title("每个销售经理2019年的利润总额")
plt.ylabel("利润额")
plt.xlabel('经理')
#网格线设置
plt.grid(axis='y')
Introduction to Grid Line Setting Parameters
plt.grid() # 显示网格线 1=True=默认显示;0=False=不显示
plt.grid(1) # 显示网格线
plt.grid(True) # 显示网格线
plt.grid(b=True) # 显示网格线
plt.grid(b=1) # 显示网格线
plt.grid(b=True, axis='x') #只显示x轴网格线
plt.grid(b=True, axis='y') #只显示y轴网格线
plt.grid(b=1, which='major') # 默认就是major,例如x轴最大值为3.5(这个值占比极小,不影响作图的话),这部分图像不会显示;若which='both'则显示;若设置为minor则不显示网格(其实这里有点不懂,,既然不显示,那为什么不直接设置为b=0呢????)
Five, database data drawing
(1) Call the drawing function and make graphics
Draw the value corresponding to each manager on the image through a for loop
#y轴值的显示
for index,value in df['TotalProfit'].items():
plt.text(index,value,round(value),ha='center',va='bottom',color='k')
#通过上述查询的结果进行x,y的带入
plt.bar(df['MANAGER'], df['TotalProfit'])
Make a graph as shown in the figure:
(2) Full code
import pymysql
import pandas as pd # 用来做数据导入(pd.read_sql_query() 执行sql语句得到结果df)
import matplotlib.pyplot as plt # 用来画图(plt.plot()折线图, plt.bar()柱状图,....)
plt.rcParams['font.sans-serif'] = 'SimHei' # 设置中文字体支持中文显示
plt.rcParams['axes.unicode_minus'] = False # 支持中文字体下显示'-'号
# figure 分辨率 800x600
plt.rcParams['figure.figsize'] = (6,4) # 8x6 inches
plt.rcParams['figure.dpi'] = 100 # 100 dot per inch
#建立连接
conn = pymysql.connect(host='localhost',port=3306,user='root',password='9812yang',db='mydb')
#设置查询语句
sql = r"SELECT MANAGER, SUM(PROFIT) as TotalProfit FROM orders where FY='2019' group by MANAGER"
#执行sql语句获取统计查询结果,并赋值
df = pd.read_sql_query(sql, conn)
#调用函数
plt.bar(df['MANAGER'], df['TotalProfit'])
#设置y轴的网格线
plt.grid(axis='y')
#设置标题
plt.title("每个销售经理2019年的利润总额")
#y轴标签
plt.ylabel("利润额")
#x轴标签
plt.xlabel("经理姓名")
#将对应数值写入柱形图
for index,value in df['TotalProfit'].items():
plt.text(index,value,round(value),ha='center',va='bottom',color='k')