Python connects to the database and uses matplotlib to draw a histogram (bar())

Table of contents

1. Introduction to the histogram

(1 Introduction

(2) Advantages and disadvantages

(3) Scope of application

2. Data introduction

(1) Data composition

(2) Data selection

3. Python database connection configuration and data extraction settings

(1) Call library and connection syntax

(2) Explanation of syntax parameters

(3) Data extraction settings

4. Global variable configuration

(1) Font canvas configuration

(2) Title and label settings

Five, database data drawing

(1) Call the drawing function and make graphics

(2) Full code


1. Introduction to the histogram

(1 Introduction

Histogram, also known as bar graph (English: bargraph), long bar graph (English: barchart), bar graph (Bar graph), is a statistical report graph that expresses graphics with the length of a rectangle as a variable. The distribution of data is represented by a series of vertical stripes of varying heights, used to compare two or more values ​​(at different times or under different conditions), with only one variable, usually used for analysis of smaller data sets. Histograms can also be arranged horizontally or expressed in a multi-dimensional manner.

(2) Advantages and disadvantages

advantage:

① It is convenient for users to understand a large amount of data and the relationship between data.

②The advantage is that users can read the original data more quickly and intuitively through visual symbols.

shortcoming:

The limitation of the histogram is that it is only suitable for small and medium-sized data sets.

(3) Scope of application

The applicable occasion is a two-dimensional data set, which is used to compare data changes over a period of time

2. Data introduction

(1) Data composition

The histogram drawing data is provided by the order table (order) in the database, where the table order contains 21 columns such as order number (ORDER_ID), order date (ORDER_DATE), store name (SITE).

(2) Data selection

According to the definition and scope of application of the column chart, the data we choose for drawing this time is data that has statistical counts and can be compared, so we choose the sales manager and order profit this time.

In Navicat, the SQL statements are used to calculate the sales profits of each sales manager in 2019.

SELECT MANAGER, SUM(PROFIT) as TotalProfit FROM orders where FY='2019' group by MANAGER

3. Python database connection configuration and data extraction settings

(1) Call library and connection syntax

There is no pymysql library, you can install it by pip install pymysql

import pymysql 
import pandas as pd # 用来做数据导入(pd.read_sql_query() 执行sql语句得到结果df)
import matplotlib.pyplot as plt # 用来画图(plt.plot()折线图, plt.bar()柱状图,....)
# 1. 连接MySQL数据库: 创建数据库连接
conn = pymysql.connect(host='ip',port=端口号,user='用户名',password='用户密码',db='连接表名')

(2) Explanation of syntax parameters

After calling the library, create a connection through pymysql.connect. The connection parameters are as follows:

host: host name, ip address that can also be stored

port: database port number, general database port number 3306

user: username

password: user password

db: database name

(3) Data extraction settings

Connecting to the database and extracting data from the database involves the SQL query of the database. There will also be a simple database operation method under Python.

# 2 创建一个sql语句
# -- 统计每个销售经理2019年的利润总额

sql = r"SELECT MANAGER, SUM(PROFIT) as TotalProfit FROM orders where FY='2019' group by MANAGER"

# 3 执行sql语句获取统计查询结果
df = pd.read_sql_query(sql, conn)

4. Global variable configuration

(1) Font canvas configuration

The font canvas settings here can be placed after the library is imported when using matplotlib to draw a picture, and can be regarded as a fixed setting. The parameter introduction has been introduced in the previous plot() function drawing. Please refer to the previous article for details.

plt.rcParams['font.sans-serif'] = 'SimHei' # 设置中文字体支持中文显示
plt.rcParams['axes.unicode_minus'] = False # 支持中文字体下显示'-'号

# figure 分辨率 800x600
plt.rcParams['figure.figsize'] = (6,4)  # 8x6 inches
plt.rcParams['figure.dpi'] = 100        # 100 dot per inch

(2) Title and label settings

title() is the title setting, ylael() sets the label of the y-axis, grid() gridline setting

#标签、标题设置
plt.title("每个销售经理2019年的利润总额")
plt.ylabel("利润额")
plt.xlabel('经理')
#网格线设置
plt.grid(axis='y')

Introduction to Grid Line Setting Parameters

plt.grid() # 显示网格线 1=True=默认显示;0=False=不显示
plt.grid(1) # 显示网格线
plt.grid(True) # 显示网格线
plt.grid(b=True) # 显示网格线
plt.grid(b=1) # 显示网格线
plt.grid(b=True, axis='x') #只显示x轴网格线
plt.grid(b=True, axis='y') #只显示y轴网格线
plt.grid(b=1, which='major') # 默认就是major,例如x轴最大值为3.5(这个值占比极小,不影响作图的话),这部分图像不会显示;若which='both'则显示;若设置为minor则不显示网格(其实这里有点不懂,,既然不显示,那为什么不直接设置为b=0呢????)

Five, database data drawing

(1) Call the drawing function and make graphics

Draw the value corresponding to each manager on the image through a for loop

#y轴值的显示
for index,value in df['TotalProfit'].items():
    plt.text(index,value,round(value),ha='center',va='bottom',color='k')
#通过上述查询的结果进行x,y的带入
plt.bar(df['MANAGER'], df['TotalProfit'])

Make a graph as shown in the figure:

 

(2) Full code

import pymysql
import pandas as pd # 用来做数据导入(pd.read_sql_query() 执行sql语句得到结果df)
import matplotlib.pyplot as plt # 用来画图(plt.plot()折线图, plt.bar()柱状图,....)
plt.rcParams['font.sans-serif'] = 'SimHei' # 设置中文字体支持中文显示
plt.rcParams['axes.unicode_minus'] = False # 支持中文字体下显示'-'号
# figure 分辨率 800x600
plt.rcParams['figure.figsize'] = (6,4)  # 8x6 inches
plt.rcParams['figure.dpi'] = 100        # 100 dot per inch

#建立连接
conn = pymysql.connect(host='localhost',port=3306,user='root',password='9812yang',db='mydb')
#设置查询语句
sql = r"SELECT MANAGER, SUM(PROFIT) as TotalProfit FROM orders where FY='2019' group by MANAGER"
#执行sql语句获取统计查询结果,并赋值
df = pd.read_sql_query(sql, conn)
#调用函数
plt.bar(df['MANAGER'], df['TotalProfit'])
#设置y轴的网格线
plt.grid(axis='y')
#设置标题
plt.title("每个销售经理2019年的利润总额")
#y轴标签
plt.ylabel("利润额")
#x轴标签
plt.xlabel("经理姓名")
#将对应数值写入柱形图
for index,value in df['TotalProfit'].items():
    plt.text(index,value,round(value),ha='center',va='bottom',color='k')

Guess you like

Origin blog.csdn.net/Sheenky/article/details/125043265