Creating a Dashboard with the Matplotlib Library 📈

Creating a Dashboard with the Matplotlib Library 📈

The purpose of this tutorial is that we can build graphics to assist in the application of the data science process. We may employ visualizations during exploratory analysis, before or after processing data, or even a chart or a dashboard as final delivery. Therefore, knowing how to create a visualization, regardless of which tool it is, is of fundamental importance.

Visit Jupyter Notebook to see the concepts that will be covered about Data Visualization with Matplotlib. Note: important functions, outputs and terms are bold to facilitate understanding - at least mine.

Matplotlib

In this first step we will create graphics in matplotlib, manipulate formatting, make necessary adjustments to the data so that it allows its plotting in a fit.

• Import packages

import matplotlib as mpl
import matplotlib.pyplot as plt 
import numpy as np
import pandas as pd
from IPython.display import Image 
%matplotlib inline
mpl.__version__    
'3.3.3'

• Matplotlib Styles

Once matplotlib is loaded, the library brings some styles—templates that can be used to create graphics, without the need to construct everything from scratch.

print(plt.style.available)
['seaborn-dark', 'seaborn-darkgrid', 'seaborn-ticks', 'fivethirtyeight', 'seaborn-whitegrid', 'classic', '_classic_test', 'fast', 'seaborn-talk', 'seaborn-dark-palette', 'seaborn-bright', 'seaborn-pastel', 'grayscale', 'seaborn-notebook', 'ggplot', 'seaborn-colorblind', 'seaborn-muted', 'seaborn', 'Solarize_Light2', 'seaborn-paper', 'bmh', 'tableau-colorblind10', 'seaborn-white', 'dark_background', 'seaborn-poster', 'seaborn-deep']

Each style of this above has the configuration of color, size, positioning of the elements etc. 

• Create function 

Let's define a function to create a plot. Once we have a code it starts to repeat over and over again, it is convenient to make this repetition into a function - we create function for repetition.

First, let's define some random values with the randn function of the random module of the numpy packet. 

Then create subplots - several small plots within the plot area. As output we have the figure objects and axes.

Following we define all the parameters of the graph, the parameters of varíaxel x and y, type and density are applied to the axes and then we still specify in axes the title, labels, subtitles. Finally, use the show() function to have the graph displayed on the notebook:

def plot_1():
x = np.random.randn(5000, 6)

(figure, axes) = plt.subplots(figsize = (16,10))

(n, bins, patches) = axes.hist(x, 12,
density = 1,
histtype = 'bar',
label = ['Color 1', 'Color 2', 'Color 3', 'Color 4', 'Color 5', 'Color 6'])

axes.set_title("Histogram\nFor\nNormal Distribution ", fontsize = 25)
axes.set_xlabel("Dados", fontsize = 16)
axes.set_ylabel("Frequência", fontsize = 16)
axes.legend()
plt.show()

• Call function

Let's call the created function:

plot_1()
Não foi fornecido texto alternativo para esta imagem


  • The data are comes from the 5000 random randn values;
  • The colors were defined with a list of colors label;
  • Chart type was passed through the histtype parameter;
  • Graph legends enabled on axes with legend();
  • X and y axe sums defined set_label();
  • Main title defined with set_title().

Customize charts

We can create our styles, that is, we can completely customize the graphics. We will run a command directly on the operating system:

• Windows users

When running the code below, we list the contents of the directory:

!dir styles

• Mac and Linux users

When running the code below, we list the contents of the directory:

!ls -l styles

Query styles in the directory

We have two files in the directory - mplstyle, are matplotlib styles. Let's look at one of these files: 

• Windows users

When running the code below, we load the personalstyle-1 of the styles directory:

!type styles/personalstyle-1.mplstyle

• Mac and Linux users

When running the code below, we load the personalstyle-1 of the styles directory:

!cat styles/personalstyle-1.mplstyle

• How to use custom style

We call the function plt.style.use and point to the directory where the style text file is stored:

plt.style.use("styles/personalstyle-1.mplstyle")

• Call plot function

We call again the plot function defined at the beginning:

plot_1
Não foi fornecido texto alternativo para esta imagem

See that a priori the chart was in the standard format of matplotlib and we changed the style to a more professional look.

To assist in our work, we will use the car dataset from the UCI Machine Learning Repository: UCI Automobile Data Set. We will upload the csv file and, from this dataset, we will build our graphics.

Python modularization

Let's open the directory and and find the three files with extension .py - 3 very important modules: generatedata.py, generateplot.py, radar.py. - We can open these files through a text editor and explore them a bit more.

As we build our analysis process, we will use codes that repeat itself multiple times in multiple projects. For example, for the vast majority of Machine Learning algorithms we will have to normalize the data, they require to receive normalized data before applying predictive modeling. That is, we will have to normalize the data in several different projects in which we work.

In so that we do not have to repeat this code, we can create a Python Module and use it with each project required. Automating work will help us to be more productive and consequently get better results. Therefore, having a custom module, it is enough to load the module and make the call to the function specified in the module - Modularize, is to professionalize the work of a Data Scientist.

Using Pandas to Load Data

First, let's import the package sys - operating system management pack, call the append function of the path module of the sys library and make a join to the lib directory, i.e. we will bring the lib directory to be recognized by this Jupyter Notebook that we are working on.

Next we import the modules that are allocated in the lib: generatedata.py, generateplot.py, and radar.py.

import sys
sys.path.append("lib")
import generatedata, generateplot, radar

• Call to module function 

We will call the get_raw_data function - when checking the function in the generatedata.py module, we can interpret that this function loads the csv file and calls read_csv from Pandas, that is, every time we want to load a csv file, just change the file name in data_file of the function in the module:

data = generate.get_raw_data()

""" Load full set
def get_raw_data():    
data_file = "cars.csv"    
return pd.read_csv(data_file) """

data.head()

Não foi fornecido texto alternativo para esta imagem

• Another function of the module

Just as we call the get_raw_data, we can call the get_limited_data - it will load only a few variables from the dataset:

data_subset = generatedata.get_limited_data()

""" Check limited data
def get_limited_data(cols = None, lower_bound = None):    
    if not cols:       
       cols = limited_columns    
    data = get_raw_data()[cols]    
    if lower_bound:        
       (makes, _) = get_make_counts(data, lower_bound)       
        data   ​= data[data["make"].isin(makes)]    
    return data """

data_subset.head()

The function results in a subset - only a few variables have been returned to our dataset. In this way, we can load the complete data set or just a few variables, according to our ultimate goal.

generatedata.get_all_auto_makes()

""" Search only car manufacturers
def get_all_auto_makes():     
    return pd.Series(get_raw_data()["make"]).unique()
array(['audi', 'bmw', 'chevrolet', 'dodge', 'honda', 'jaguar', 'mazda', 'mercedes-benz', 'mitsubishi', 'nissan', 'peugot', 'plymouth','porsche', 'saab', 'subaru', 'toyota', 'volkswagen', 'volvo'],dtype=object)

With the get_all_auto_makes we only look for the car manufacturers of the set, in the Series name make. 

The functions are ready, just call them. The get_make_counts will count manufacturers in the dataset, returning two parameters (manufacturers, total) that will return the outputs of the function.

automakers, total) = generatedata.get_make_counts(data_subset)

""" # get count
def get_make_counts(pddata, lower_bound=0):    
    counts = []    
    filtered_makes = []
    for make in get_all_auto_makes():        
        data = get_make_data(make, pddata)        
        count = len(data.index)        
        if count >= lower_bound:            
           filtered_makes.append(make)            
           counts.append(count)    
    return (filtered_makes, list(zip(filtered_makes, counts))) """

total
[('audi', 4),
 ('bmw', 4),
 ('chevrolet', 3),
 ('dodge', 8),
 ('honda', 13),
 ('jaguar', 1),
 ('mazda', 11),
 ('mercedes-benz', 5),
 ('mitsubishi', 10),
 ('nissan', 18),
 ('peugot', 7),
 ('plymouth', 6),
 ('porsche', 1),
 ('saab', 6),
 ('subaru', 12),
 ('toyota', 31),
 ('volkswagen', 8),
 ('volvo', 11)]
Não foi fornecido texto alternativo para esta imagem

• Number of indexes

Get the amount of indexes within the set or table:

len(data.index)
141

Normalizing Data

When we work with a dataset, with very different scales, it may be necessary to normalize the data, that is, to put the data on the same scale - a statistical task.

First, let's create a copy of our dataset with the copy function and pass it on to norm_data, a mere copy of the set - a good practice as we make transformations in the data.

norm_data = data.copy()

Next, let's rename the horsepower column:

norm_data.rename(columns = {"horsepower" : "power"}, inplace = True)
norm_data.head()
Não foi fornecido texto alternativo para esta imagem

To normalize, we will call the norm_column - this function is in the generatedata.py.

The norm_column function operates the minimum and maximum values, i.e. (value -min value) / (max-min) - an elementary mathematical operation, to then normalize the data by placing them on the same scale. 

The function norm_column receives as parameter the column name in col_name, pddata, and inverted. If inverted is equal to True, the function executes the if inverted code block, that is, if the value is True, it executes the last line of code of this function:

def norm_column(col_name, pddata, inverted = False):          

pddata[col_name] -= pddata[col_name].min()    
pddata[col_name] /= pddata[col_name].max()    
if inverted:       
   pddata[col_name] = 1 - pddata[col_name]

• Normalize columns

Let's normalize some columns, since it is not necessary to normalize the strings or categorical, normalize only the numeric columns.

# higher values
generatedata.norm_columns(["city mpg", "highway mpg", "power"], norm_data)

norm_data.head()
Não foi fornecido texto alternativo para esta imagem

When comparing the two tables, we have the table on the left with the original data and the right table with the normalized data, that is, on the same numerical scale. We don't change the scale contained in the data at all, we just change the scale - the data still represents the same thing, just on a different scale. This is useful for chart construction and predictive modeling.

However, some variables may require a different way to normalize the data. The previous normalization was done to higher values, now we will normalize lower values - apply the reversed normalization with the function of module generatedata.py, invert_norm_columns, which calls the norm_column function to larger values and this time passes the inverted parameter equal to True.

For this reversed normalization, the lower value variables will be passed:

# normalize lower values
generatedata.invert_norm_columns(["price", "weight", "riskiness", "losses"], norm_data)

norm_data.head()
Não foi fornecido texto alternativo para esta imagem

Having all variables now normalized, we are ready to start the plot series.

Plots

First, we call plt.figure, which will create a figure - a plot area with figsize dimensions, and create GridSpec, a kind of drawing area. Then we call the make_autos_price_plot function that belongs to the generateplot.py module, which defines the title, the chart type for scatter plot, defines the default label type and some parameters to leave the chart exactly in the desired position and we make just a simple call to this make_autos_price_plot function:

figure = plt.figure(figsize = (15, 5))
prices_gs = mpl.gridspec.GridSpec(1, 1)
prices_axes = generateplot.make_autos_price_plot(figure, prices_gs, data)

plt.show()
Não foi fornecido texto alternativo para esta imagem

• Vertical Dispersion Plot

figure = plt.figure(figsize = (15, 5))
mpg_gs = mpl.gridspec.GridSpec(1, 1)
mpg_axes = generateplot.make_autos_mpg_plot(figure, mpg_gs, data)

plt.show()
Não foi fornecido texto alternativo para esta imagem

• Stacked Bar Plot

figure = plt.figure(figsize = (15, 5))
risk_gs = mpl.gridspec.GridSpec(1, 1)
risk_axes = generate.make_autos_riskiness_plot(figure, risk_gs, norm_data)

plt.show()
Não foi fornecido texto alternativo para esta imagem

• Inverted Stacked Bar Plot

figure = plt.figure(figsize = (15, 5))
loss_gs = mpl.gridspec.GridSpec(1, 1)
loss_axes = geraplot.make_autos_losses_plot(figure, loss_gs, norm_data)

plt.show()
Não foi fornecido texto alternativo para esta imagem

• Standard bar chart

figure = plt.figure(figsize = (15, 5))
risk_loss_gs = mpl.gridspec.GridSpec(1, 1)
risk_loss_axes = generateplot.make_autos_loss_and_risk_plot(figure, risk_loss_gs, norm_data)

plt.show()
Não foi fornecido texto alternativo para esta imagem

With all this, we were able to create several different graphs through the calls of functions belonging to the plotting module generateplot.py, in an organized and effective way.

• Radar Graph

Finally, we will create the radar graph - very complex chart type. We have to use a unique module for radar.py because of its complexity, it may take up to a long time to render this graph.

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from IPython.display import Image
import warnings
#warnings.filterwarnings('ignore')
import matplotlib
%matplotlib inline

import sys
sys.path.append("lib")
import generatedata, generateplot, radar
#plt.style.use("styles/personalstyle-1.mplstyle")
data = generatedata.get_raw_data()
data.head()
data_subset = generatadata.get_limited_data()
data_subset.head()
data = generatedata.get_limited_data(lower_bound = 6)
data.head()
norm_data = data.copy()
norm_data.rename(columns = {"horsepower": "power"}, inplace = True)
figure = plt.figure(b= (15, 5))
radar_gs = mpl.gridspec.GridSpec(3, 7,
           height_ratios = [1, 10, 10],
           wspace = 0.50,
           hspace = 0.60,
           top = 0.95,
           bottom = 0.25)
radar_axes = generateplot.make_autos_radar_plot(figure, gs=radar_gs, pddata=norm_data)
plt.show()

See below we have the radar chart with the title: Radar plot with 7 dimensions for 12 manufacturers - each named at the top of each radar. The dimensions refer to the variables that are around each radar, indicating the relationship of each of the manufacturers through individual plots.

Não foi fornecido texto alternativo para esta imagem

• Combined Plots - Dashboard

Let's now build a dashboard through matplotlib. The diagram below is known as wireframe - very common term in design, a general idea of what will be done. 

This wireframe shows a dashboard template - a dashboard is a set of charts, that is, we define our plot area and within that area we organize our graphics that were created individually through a combined plot - dashboard.

--------------------------------------------
|               overall title              |
--------------------------------------------
|               price ranges               |
--------------------------------------------
| combined loss/risk |                     |
|                    |        radar        |
----------------------        plots        |
|  risk   |   loss   |                     |
--------------------------------------------
|                   mpg                    |
--------------------------------------------

Below we have the construction of the wireframe in a fragmented way, layer by layer: draw the figure with matplotlib pyplot, draw the grid that is the area where the graphics will be placed. Once the figure is posted, we will place the title layers, the overall title - the first layer where the subplot will be defined and added.

# building layers (without data)
figure = plt.figure(figsize=(10, 8))
gs_master = mpl.gridspec.GridSpec(4, 2, height_ratios=[1, 2, 8, 2])

## ----------------------------- ## -----------------------------

# layer 1 - Title
gs_1 = mpl.gridspec.GridSpecFromSubplotSpec(1, 1, subplot_spec=gs_master[0, :])
title_axes = figure.add_subplot(gs_1[0])

## ----------------------------- ## -----------------------------

# layer 2 - Price
gs_2 = mpl.gridspec.GridSpecFromSubplotSpec(1, 1, subplot_spec=gs_master[1, :])
price_axes = figure.add_subplot(gs_2[0])

## ----------------------------- ## -----------------------------

# layer 3 - Risks and Radar
gs_31 = mpl.gridspec.GridSpecFromSubplotSpec(2, 2, height_ratios=[2, 1], subplot_spec=gs_master[2, :1])
risk_and_loss_axes = figure.add_subplot(gs_31[0, :])
risk_axes = figure.add_subplot(gs_31[1, :1])
loss_axes = figure.add_subplot(gs_31[1:, 1])
gs_32 = mpl.gridspec.GridSpecFromSubplotSpec(1, 1, subplot_spec=gs_master[2, 1])
radar_axes = figure.add_subplot(gs_32[0])

## ----------------------------- ## -----------------------------

# layer 4 - MPG
gs_4 = mpl.gridspec.GridSpecFromSubplotSpec(1, 1, subplot_spec=gs_master[3, :])
mpg_axes = figure.add_subplot(gs_4[0])

## ----------------------------- ## -----------------------------

# joins layers still without data
gs_master.tight_layout(figure)
plt.show()

The second layer is the pricecharts, third layer with subdisiviews with scratches and radar charts, fourth layer of MPG and finally we call the tight_layout() to join all these layers in the same area.

Não foi fornecido texto alternativo para esta imagem

Once the wireframe is created, we can plot the charts in the reserved areas within that grid. From here, we call the charts for each area:

# building layers with data
figure = plt.figure(figsize = (15, 15))
gs_master = mpl.gridspec.GridSpec(4, 2, 
                                  height_ratios = [1, 24, 128, 32], 
                                  hspace = 0, 
                                  wspace = 0)

## ----------------------------- ## -----------------------------

# layer 1 - title
gs_1 = mpl.gridspec.GridSpecFromSubplotSpec(1, 1, subplot_spec = gs_master[0, :])
title_axes = figure.add_subplot(gs_1[0])
title_axes.set_title("Plots", fontsize = 30, color = "#cdced1")
geraplot.hide_axes(title_axes)

## ----------------------------- ## -----------------------------

# layer 2 - price
gs_2 = mpl.gridspec.GridSpecFromSubplotSpec(1, 1, subplot_spec = gs_master[1, :])
price_axes = figure.add_subplot(gs_2[0])
geraplot.make_autos_price_plot(figure, 
                               pddata = dados, 
                               axes = price_axes)

## ----------------------------- ## -----------------------------

# layer 3 - risks
gs_31 = mpl.gridspec.GridSpecFromSubplotSpec(2, 2, 
                                             height_ratios = [2, 1], 
                                             hspace = 0.4, 
                                             subplot_spec = gs_master[2, :1])

risk_and_loss_axes = figure.add_subplot(gs_31[0, :])
geraplot.make_autos_loss_and_risk_plot(figure, 
                                       pddata = dados_normalizados, 
                                       axes = risk_and_loss_axes, 
                                       x_label = False, 
                                       rotate_ticks = True)

risk_axes = figure.add_subplot(gs_31[1, :1])
geraplot.make_autos_riskiness_plot(figure, 
                                   pddata = dados_normalizados, 
                                   axes = risk_axes, 
                                   legend = False, 
                                   labels = False)

loss_axes = figure.add_subplot(gs_31[1:, 1])
geraplot.make_autos_losses_plot(figure, 
                                pddata = dados_normalizados, 
                                axes = loss_axes, 
                                legend = False, 
                                labels = False)

## ----------------------------- ## -----------------------------

# layer 3 - radar
gs_32 = mpl.gridspec.GridSpecFromSubplotSpec(5, 3, 
                               height_ratios = [1, 20, 20, 20, 20], 
                               hspace = 0.6, 
                               wspace = 0, 
                               subplot_spec = gs_master[2, 1])
(rows, cols) = geometry = gs_32.get_geometry()
title_axes = figure.add_subplot(gs_32[0, :])
inner_axes = []
projection = radar.RadarAxes(spoke_count = len(norma_data.groupby("make").mean().columns))

[inner_axes.append(figure.add_subplot(m, projection = projection)) for m in [n for n in gs_32][cols:]]
geraplot.make_autos_radar_plot(figure, 
                               pddata = dados_normalizados, 
                               title_axes = title_axes, 
                               inner_axes = inner_axes, 
                               legend_axes = False, 
                               geometry = geometry)

## ----------------------------- ## -----------------------------

# layer 4 - MPG
gs_4 = mpl.gridspec.GridSpecFromSubplotSpec(1, 1, subplot_spec = gs_master[3, :])
mpg_axes = figure.add_subplot(gs_4[0])
geraplot.make_autos_mpg_plot(figure, 
                             pddata = dados, 
                             axes = mpg_axes)

## ----------------------------- ## -----------------------------

# joining layers
gs_master.tight_layout(figure)
plt.show()

When we join all these layers, we have as output a complete dashboard:

Não foi fornecido texto alternativo para esta imagem

We can understand then that a dashboard is a set of charts in the same plot area. We can build our own dashboard from 0, without having to pay for proprietary tools. On the other hand, there is the issue of programming that generates additional complexity

This dashboard could be used as the final result of our work. It could be a dashboard for real-time data monitoring, sales forecasting or historical data analysis - it always depends on our goal.

Thank you.

Anello – Medium

Radwan Berrak

Data Science Lead at Citi

3y

It could have been great if the code was written in one language. I had to translate the part he left out and download the data from the internet...etc. I had to fix some axes ticks errors too. In general it was fine.

Like
Reply
Akshaykumar Sonune

Data Science | Automation | Business Intelligence | Python | PySpark | Power BI | Snowflake | Azure Synapse | ADB | ADF | Data-Storyteller | Photography enthusiast

4y

This is absolutely great. Thanks for sharing

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics