Summary and introduction of 28 data visualization charts

Data visualization is a lingua franca unto itself. What we mean by lingua franca here is that it is capable of representing information to people from all walks of life. It breaks down barriers of language and technical understanding. Data is some combination of numbers and words, but visualization can show the information contained in the data.

"Data visualization helps bridge the gap between numbers and words" - Brie E. Anderson.

There are many no-code/less-code data visualization tools like tableau, Power BI, Microsoft Excel, etc. But the best tool for being a data science practitioner is python. So when we are doing data science projects, we must pay attention to data visualization, because this is the easiest way to represent information and gain insight into data.

So in this post, we're going to put together all the data visualization charts we can. If you are a beginner in data science, then this article will be best for you.

Data visualization is a method of representing data and information graphically. It can be described as using charts, animations, infographics, etc. to transform data into a context that can be visualized. It helps to spot trends and patterns in data.

If you were given a dataset in tabular format with hundreds of rows, you would be confused. But proper data visualization can help you get the right trends, outliers and patterns in your data, etc.

Basic Data Visualization

Here we summarize 9 basic number visualization diagrams, which are the simplest diagrams that we commonly use in our daily work.

Frequency table

Frequency is a count of the number of times a value occurs. A frequency table is a way of representing frequencies in a table. The form is shown below.

Scatter Plot

A scatterplot is a method of plotting two numerical variables in a two-dimensional coordinate system. We can easily visualize the data distribution with a scatterplot

Line Plot

A line chart is similar to a scatter chart, but the points are connected sequentially with a continuous line. Line charts are more intuitive when looking for data flow in two-dimensional space.

In the picture above, you can see how the weight continues to change.

The Bar Chart
histogram is mainly used to represent the frequency of categorical variables in columns. The different heights of the bars represent frequency magnitudes.

Histogram

The concept of a square plot is the same as that of a bar plot. In a histogram, frequencies are shown in discrete bars for categorical variables, while a histogram shows frequencies for continuous intervals. It can be used to find the frequency of a continuous variable within an interval.

Pie Chart

Pie charts represent frequency as a percentage in a circle. Each element holds the area of ​​the circle according to its frequency percentage.

Exploded Pie Chart

expand the pie chart

Expanding a pie chart is the same as pie charting. In an expanded pie chart, you can expand a portion of the pie chart to highlight elements.

Distribution Plot

Distribution plots can show the distribution of continuous variables.

Box Plot

A boxplot is a standardized way to display the distribution of data based on five-number summaries ("Minimum", First Quartile [Q1], Median, Third Quartile [Q3], and "Maximum") method. It can display information such as outliers.

Intermediate Data Visualization

The intermediate visual chart is an extension of the basic visual chart, and we have summarized 8 here

Stacked Bar Chart

A stacked column chart is a special type of column chart. We can integrate more information in a stacked histogram than traditional histograms [2].

Grouped Bar Chart

The name "grouped histogram" means - it is a special type of histogram divided into different groups. It is mainly used to compare two categorical variables.

Stacked Area Chart

Stacked area charts plot several area series superimposed together. The height of each series is determined by the value in each data point.

Pareto Diagram

Pareto charts include bar charts and line charts, where individual values ​​are represented in descending order by the bars and the cumulative total is represented by a line.

Donut Chart

A donut chart is a simple pie chart cut at the center of a circle. Although it conveys the same meaning as a pie chart, it has some advantages: In pie charts we often confuse the areas shared by each category. Since the center of the pie chart is removed from the donut chart, it can emphasize the reader's focus on the outer arc of the pie chart, while the inner circle can also be used to display additional information.

Heatmap

A heatmap is a rectangular map that can be divided into sub-rectangles, with different colors representing different values/intensities.

Radar Chart

A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart, with three or more variables represented on axes starting from the same point. The spokes from the center are called radii and represent the numerical values ​​of the variables. The angle between the radii contains no information.

Treemap

Treemaps display hierarchical data in the form of nested rectangles.

Advanced Data Visualization

These diagrams are complex and probably not common in general, but they are very useful for specific tasks. 10 related graphs are summarized here

Parallel Coordinate Plot

Because we live in three-dimensional space, general visualization deals with data in three dimensions at most. But sometimes it is necessary to visualize data in more than 3 dimensions, we often use PCA or t-SNE to reduce the dimensionality and plot it. In the case of dimensionality reduction, a large amount of information may be lost. And sometimes we need to consider all the features, then parallel coordinates plot is needed.

Hexagonal Binning

The hexagonal binning plot is a method to visually represent the density of two-dimensional numerical data points with hexagons.

Contour Plot

A 2D contour density map is another way to visualize the density of data points within a specific area. It is convenient for finding the density of two numerical variables. For example the chart below shows how many data points are in each shaded area.

QQ-Plot

QQ stands for quantile-quantile plot. This is a way to visually check whether a numeric variable is normally distributed.

Violin Plot

Violin plots and box plots are related. Another piece of information that can be obtained from a violin plot is the density distribution. Simply put it is a boxplot integrated with a density distribution.

Box Plot

Boxen Plot is a new type of box plot introduced by seaborn library. Boxes for boxplots are created at quartiles. But in Boxen plot, the data is divided into more quantiles. It can provide more insights about the data.

Point Plot

A point plot contains a line chart with lines called error bars.

The central tendency of the numerical variable is represented by the position of the points shown in the figure above, and the error bars represent the uncertainty (confidence interval) of the variable. A line chart is drawn to compare the variability of a numeric variable at different categorical values.

Swarm plot

The clustered scatterplot is another interesting plot inspired by "beeswarm", where we can see how different categorical values ​​are distributed along the value axis.

Word Cloud

In a single cloud, all words are plotted in a specific area, and frequently occurring words are highlighted (shown in larger font.

Sunburst Chart

A sunburst chart is a customized version of a donut or pie chart that integrates some additional hierarchical information into the chart.

Geospatial Data Visualization

Geospatial data visualization focuses on the relationship between data and its physical location, and geospatial visualization is unique in that it is not large in scale.

Geographic visualizations overlay variables on a map, using latitude and longitude to display information.

Maps are the main focus of geospatial visualization. They range from depicting streets, towns, parks, or subdivisions to showing the boundaries of a country, continent, or entire planet. They act as containers for additional data. They can help identify problems, track changes, understand trends, and execute predictions that are relevant to specific places and times. So here it is explained separately

Some python libraries and tools for visualization of geospatial data

tableau, power bi, ArcGIS, QGIS, etc. can all be used for complex geospatial data visualization. There are also many libraries in python that are also very suitable for geospatial data visualization, such as

  • Geoplot
  • Leaf
  • Geopand
  • PySAL
  • rworldmap
  • rworldxtra
  • etc.

I will use Folium to show some implementations of the visualization.

The hospital dataset from HIFLD is used here, which contains hospital locations and other hospital information. According to the authorization information, this data can be displayed publicly

There are 34 features in the main dataset. For demonstration purposes, I will use the characteristics "ADDRESS", "STATE", "TYPE", "STATUS", "POPULATION", "LATITUDE", "LONGITUDE". Where "LATITUDE" and "LONGITUDE" will be used to determine the location of the hospital on the map, while other columns such as STATE, TYPE, and STATUS are used for filtering, and finally ADDRESS and POPULATION are used as metadata for markers on the custom map.

Draw a basic map

Import the libraries needed to draw the map.

import pandas as pd
import folium
from folium.plugins import MarkerCluster

Load the dataset.

hosp_df = pd.read_csv('/work/Hospitals.csv')

Filter data.

WORKING_COLS = ["ADDRESS", "STATE", "TYPE", "STATUS", "POPULATION", "LATITUDE", "LONGITUDE"]
STATE = "CA"
hosp_df = hosp_df.loc[hosp_df["STATE"] == STATE, WORKING_COLS]
hosp_df.head(5)

Some data preprocessing.

hosp_df = hosp_df[hosp_df["POPULATION"] >= 0]
hosp_df.describe()

draw a map

Folium provides .Map() which takes a location argument as a list containing a pair of latitude and longitude and generates a map around the given location, automatically centering the generated map around the data.

m=folium.Map(
    location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
    zoom_start=6)
m

The triangle points in the figure are the data points contained in our data set

add layer

The default map in Folium is OpenStreetMap. We can add different layers, such as Stamen Terrain, Stamen Water Color, CartoDB Positron, etc., to get different layer representations

Use folium.TileLayer to add multiple layers to a single map and use folium.LayerControl to switch interactively.

m=folium.Map(
    location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
    zoom_start=6)
folium.TileLayer('cartodbdark_matter').add_to(m)
folium.TileLayer('cartodbpositron').add_to(m)
folium.TileLayer('Stamen Terrain').add_to(m)
folium.TileLayer('Stamen Toner').add_to(m)
folium.TileLayer('Stamen Water Color').add_to(m)
folium.LayerControl().add_to(m)
m

You can see the layer selection button appears in the upper right corner

generate map markers

In an interactive map, markers are important for specifying locations. folium.Marker can create a marker at a given position

m=folium.Map(
    location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
    zoom_start=8)

hosp_df.apply(
    lambda row: folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']]
        ).add_to(m),
    axis=1)
m

custom tag

You can also use custom tags

m=folium.Map(
    location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
    zoom_start=8)

def get_icon(status):
  if status == "OPEN":
    return folium.Icon(icon='heart',
                       color='black',
                       icon_color='#2ecc71'
                       )
  else:
    return folium.Icon(icon='glyphicon-off',
                       color='red')

hosp_df.apply(
    lambda row: folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        #color='red',
        popup=row['ADDRESS'],
        tooltip='<h5>Click here for more info</h5>',
        icon=get_icon(row['STATUS']),
        ).add_to(m),
    axis=1)
m

Generate Bubble Chart

To represent numerical values ​​on the map, we can draw circles of different sizes by binding the circle radius to its value in the dataset. In our case, we represent the population covered by each center with a radius proportional to its population value.

m=folium.Map(
    location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
    zoom_start=8)

def get_radius(pop):
  return int(pop / 20)

hosp_df.apply(
    lambda row: folium.CircleMarker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        radius=get_radius(row['POPULATION']),
        popup=row['ADDRESS'],
        tooltip='<h5>Click here for more info</h5>',
        stroke=True,
        weight=1,
        color="#3186cc",
        fill=True,
        fill_color="#3186cc",
        opacity=0.9,
        fill_opacity=0.25,
        ).add_to(m),
    axis=1)
m

Generate marker clusters

When working on a map with dense data points, use marker clusters to avoid confusing situations where many nearby markers overlap each other. Folium provides an easy way to set marker clusters, adding them to a folium.plugins.MarkerCluster instance

m=folium.Map(
    location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
    zoom_start=8)

cluster = MarkerCluster(name="Hospitals")

def get_icon(status):
  if status == "OPEN":
    return folium.Icon(icon='heart',
                       color='black',
                       icon_color='#2ecc71'
                       )
  else:
    return folium.Icon(icon='glyphicon-off',
                       color='red')

hosp_df.apply(
    lambda row: folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        popup=row['ADDRESS'],
        tooltip='<h5>Click here for more info</h5>',
        icon=get_icon(row['STATUS']),
        ).add_to(cluster),
    axis=1)
cluster.add_to(m)
m

When the mouse hovers over a marker, it shows the boundaries of the area covered by that cluster. This default behavior can be canceled by setting the showCoverageOnHover option to false as follows

cluster = MarkerCluster(name="Hospitals", options={"showCoverageOnHover": False})

Summarize

This post is a bit long, but I have every confidence that it will help you a lot. I've put together an overview of basically all charts in this article. This will be a complete article on data visualization, especially showing some methods of geolocation visualization, hope this article helps you.

https://avoid.overfit.cn/post/93e1e9cadcb84b13bf6a44b981a41843

AuthorMd. Zubair

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/128576139
Recommended