Interactive data visualization with Plotly and Python

Interactive data visualization with Plotly and Python

Python is a great helper for data exploration and data analysis, thanks to the support of amazing libraries such as numpy, pandas, matplotlib, etc. During our data exploration and data analysis phases, it is very important to understand the data we are dealing with, and for this, a visual representation of the data can be very important.

It's common for us to use Jupyter notebooks for these projects because they're nice, fast, easy, and they allow us to interact and play with our data. However, there is a limit to what we can do, usually when we deal with charts, we use libraries like matplotlib or seaborn, but these libraries render static images of our charts and graphs. However, a lot gets lost in the details, so we need to fine-tune our charts to explore various parts of our data. Wouldn't it be great if we could interact with our charts by zooming, adding contextual information to our data points, like hover interactions? This is where Plotly can help us.

Plotly is a Python library for making interactive, publication-quality charts such as line plots, scatterplots, area plots, bar charts, error bars, boxplots, histograms, heatmaps, subplots, and more and more.

But enough of our talking, let's start making some diagrams...


install dependencies

Before we build anything, let's install the dependencies. I like to use pipenv , but the same applies to Anaconda or other package managers.

Here is the list of dependencies we need:

  • jupyter. A web app that allows you to create and share documents containing live code, equations....you know it!
  • pandas. Very powerful data analysis library, we will use it in our project to process our data
  • numpy. Python for scientific computing, used in our project for math and generating random numbers
  • seaborn. Statistical data visualization based on matplotlib, which we will use to load some sample data from the library.
  • cufflinks. Allow plotly to work with pandas
  • plotly. Interactive Chart Library

Here are the commands to install them:

pipenv install jupyter
pipenv install plotly cufflinks pandas seaborn numpy
复制代码

getting Started

To get started, we need to fire up our jupyter notebook and create a new document:

pipenv run jupyter notebook
复制代码

Once we're there, we can start adding some code. Since this post is not a tutorial on Jupyter Notebooks, I will only focus on the code and not on how to use the documentation.

Let's start importing the library:

import pandas as pd
import numpy as np
import seaborn as sns
import cufflinks as cf
复制代码

Plotly with the help of other libraries can render plots in different contexts, such as in jupyter notebooks, online in plotly dashboards, etc. By default, the library works in offline mode, which is exactly what we want. However, we also need to tell cufflinks that we are going to use offline mode for the graph. This setup can be done programmatically by adding the following cell to our notebook:

cf.go_offline()
复制代码

Now we're ready to grab some data and start plotting.


generate random data

I don't want to focus too much on how the data is loaded or retrieved, so for this we will simply generate random data for the graph, in a new cell we can use pandas and numpy to build a 3D matrix:

df = pd.DataFrame(np.random.randn(300, 3), columns = ["X", "Y", "Z"])
df.head()
复制代码

Awesome, using numpy we can generate our random numbers and we can load them into pandas DataFrame objects. Let's see what our data looks like:

df.head()
复制代码

And what we get is:

          X              Y             Z
0      0.176117      1.221648      1.201206
1      1.931615      -2.303667     1.914741
2      1.213322      -0.434855     -0.639277
3      0.763220      0.118211      -0.838034
4      0.245442      0.697897      1.169540
复制代码

This is good! Time to draw some diagrams.


our first picture

A convenient way to plot DataFrames is to use the iplot method available on Series and DataFrames, thanks to cufflinks. Let's start with all the defaults.

df.iplot()
复制代码

Line Chart - all defaults

In simple terms, it's just like any other chart, but if you hover over the chart, you'll see something magical. When you hover over the top right of the screen, a toolbar appears that allows you to zoom, pan, and do other things. The chart also allows you to zoom in by drawing an area on the chart, or simply see a tooltip on each data point with additional information, such as a value.

Our graph above is certainly better than the static one, but it's still not great. Let's try to render the same chart with a scatterplot.

df.iplot(mode = "markers")
复制代码

marker map

Not bad, but not great either, the dots are too big, let's resize them a bit.

df.iplot(mode = "markers", size = 5)
复制代码

Marker Plot -- Custom Data Point Size

much better! Next, let's try something different.

histogram

Let's forget about our randomly generated dataset for a moment and let's load a popular dataset from the seaborn library to render some other chart types.

titanic = sns.load_dataset("titanic")
titanic.head()
复制代码

The dataset we will be dealing with is called "Titanic" and it contains information about what happened to the people who traveled on the Titanic on that tragic day.

A special variable in this dataset is that survived it contains Boolean information, 0 for people who died and 1 for people who survived the accident. Let's build a histogram to see how many men and women survived:

titanic.iplot(kind = "bar", x = "sex", y = "survived")
复制代码

histogram

Trends are easy to see, however, if you just share the graph, it's impossible to know what we're talking about since it has no legend and no title. So let's get this out of the way:

titanic.iplot(kind = "bar", x = "sex", y = "survived", title = "Survivors", xTitle = "Sex", yTitle = "Number of survived")
复制代码

bar chart with title

so much better

But what if we want to draw a horizontal bar chart? very simple.

titanic.iplot(kind = "barh", x = "sex", y = "survived")
复制代码

horizontal bar chart

Great! Let's explore some more features


theme

Our chart looks good so far, but maybe we want to use a different color mode for our chart. Fortunately, we have a set of themes that we can use to render our graphs. Let's list them and switch to another.

List topics:

cf.getThemes()
复制代码

It should output something like this:

['ggplot', 'pearl', 'solar', 'space', 'white', 'polar', 'henanigans']
复制代码

We can toggle the theme for all future charts by simply adding:

cf.set_config_file(theme="solar")
复制代码

Now, if we render our bar chart again, we get something like this:

titanic.iplot(kind = "bar", x = "sex", y = "survived")
复制代码

Bar chart with solar theme

Dark Mode, is one of my favorites, but please check it out and let me know which one is your favourite.


surface chart

So far we've rendered amazing 2D charts, but plotly also supports 3D charts. Let's build some 3D charts to have some fun. The next graph we are going to do is a 3D surface graph, for this we need to create some data with pandas as shown in the image below.

df = pd.DataFrame({"A": [100, 200, 300, 200, 100], "B": [100, 200, 300, 200, 100], "C": [100, 200, 300, 200, 100]})
df.head()
复制代码

You should get something like:

        A        B	  C
0      100      100      100
1      200      200      200
2      300      300      300
3      200      200      200
4      100      100      100
复制代码

Now, let's "surface" it onto a 3D chart:

df.iplot(kind = "surface")
复制代码

 surface map

It looks really good! And the colors are bright, let's change the color scale to make it more visually appealing:

df.iplot(kind = "surface", colorscale = "rdylbu")
复制代码

Surface plot with custom color scale

That's pretty cool but that's not all, have you tried interacting with graphs in a notebook? You can even rotate it!


Summarize

Plotly is a great chart replacement for your data exploration and analysis. As you can see, it provides interactive dashboards that can help you better identify your outliers and navigate through them to gain a better understanding of your data. I probably won't use plotly for every dataset, but it's a really interesting library that we should check out.

Thank you for reading!

Guess you like

Origin blog.csdn.net/weixin_73136678/article/details/128805136