Learn Data Analysis with Python: Find out the practical code for Matplotlib (Data Visualization)

Step by step Python Code for data visualization (explore Matplotlib)

Introduction

If we want to apply for any data analyst or data scientist role, it is necessary to know one of the programming languages used for such roles. It could be R or Python or Scala etc. To fulfill this, I have selected Python for data analysis.

https://arpitatechcorner.wordpress.com/2021/02/23/learn-data-analysis-with-python-find-out-the-practical-code-for-data-cleaning/

https://arpitatechcorner.wordpress.com/2021/03/03/learn-data-analysis-with-python-find-out-the-practical-code-for-data-interpretation/

Now we are in the most interesting phase of data analysis. I would say the Data Visualization step is the visual representation of the efforts of the previous steps. One visual represents a thousand words. In this blog, we will work on Matplotlib. This is a beginner’s guide.

Definition

Matplotlib is a cross-platform Python Library for plotting two-dimensional graphs which are also known as plots.

Architecture

Before moving to code, let’s understand the architecture of this library.

Anatomy of a Figure

The output graph which is created by Matplotlib is known as a figure.

Basic Plotting

Code: To ignore any warnings, declare first this package.

import warnings
warnings.simplefilter(action=’ignore’, category=FutureWarning)

Code: Import required packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Code: Get data from Kaggle. (https://www.kaggle.com/gpreda/iris-dataset)

# Assign to variables before aggregating data
fld1=df[‘species’]
fld2=df[‘sepal_length’]
fld3=df[‘sepal_width’]
fld4=df[‘petal_length’]
fld5=df[‘petal_width’]

Code: Create an Aggregated data frame and create variables for analysis

df_Rev=df.groupby([“species”], as_index=False)[“sepal_length”,”sepal_width”,”petal_length”,”petal_width”].mean()

# After Aggregation
col1=df_Rev[‘species’]
col2=df_Rev[‘sepal_length’]
col3=df_Rev[‘sepal_width’]
col4=df_Rev[‘petal_length’]
col5=df_Rev[‘petal_width’]

Line Plot

The line plot is typically used to represent a relationship between two continuous variables.

Code: Plot Simple Line Graph. Here we will use the aggregated data frame.

plt.plot(col1,col2)
plt.show()

Code: Add color, line style, marker, axis labels, and title to the line graph

plt.plot(col1,col2,color=’purple’,linestyle=’-.’,marker=’1′)

plt.plot(col1,col2)
plt.xlabel(“species”)
plt.ylabel(“sepal_length”)
plt.title(“species — sepal_length “)

plt.show()

Bar Plot

Normally, bar plots are graphs that use bars to compare different categories of data.

Code: Draw a vertical bar chart with user-defined color, axis labels, and title.

plt.bar(col1,col2,color=’Green’)
plt.xlabel(‘species’)
plt.ylabel(‘sepal_length’)
plt.title(‘Analysis’)
plt.show()

Code: Draw a horizontal bar chart with the required information

plt.barh(col1,col3, color=’Magenta’)
plt.xlabel(‘sepal_width’)
plt.ylabel(‘species’)
plt.title(‘Analysis’)
plt.show()

Scatter and Bubble Plot

A scatter plot is used to compare the distribution of two variables and looking for any correlation between them.

A bubble Plot is an instance of the scatter plot, where each point on the graph is shown as a bubble.

Code: Draw a scatter plot with user-defined color, axis labels, and title. Here we are using disaggregated data frame.

plt.scatter(fld4,fld5,c=’Orange’)
plt.xlabel(‘petal_length’)
plt.ylabel(‘petal_width’)
plt.title(‘Analysis’)
plt.show()

Code: Add size, color, and appearance to scatter plot and make it a bubble plot

#Use Map function to create data for color
species = df[‘species’].map({“setosa” : 0, “versicolor” : 1, “virginica” : 2})

plt.scatter(fld4,fld5, s=50*fld4*fld5, c=species, alpha=0.3)
plt.xlabel(‘petal_length’)
plt.ylabel(‘petal_width’)
plt.title(‘Bubble Analysis’)
plt.show()

Histogram Plot

The histogram plot is used to draw the distribution of a continuous variable and its values are split into the required number of bins.

Code: Draw histogram with user-defined bins and color. Here we are using disaggregated data frame.

nbin=15
plt.hist(fld3,bins=nbin,color=’brown’)
plt.xlabel(“Sepal Width”)
plt.ylabel(“Frequency”)
plt.title(“Distribution”)
plt.show()

Box Plot

The box plot is used to visualize the descriptive statistics of a continuous variable.

Code: Draw a simple box plot with the disaggregated data frame.

data=[df[‘sepal_width’],df[‘petal_width’],df[‘sepal_length’],df[‘petal_length’]]
plt.boxplot(data)
plt.show()

Violin plot

The violin plot is a combination of histogram and box plot. It provides information about the complete distribution of data with mean/median, min, and max values.

Code: Using the same data set of Box plot, creating a violin plot. Here showmeans is displaying the mean value. We can replace it with showmedian

plt.violinplot(data, showmeans=True)
plt.show()

Heatmap plot

A heatmap is used to visualize data range in different colors with varying intensity.

Code: Create a heatmap using disaggregated data

corr = df.corr()
plt.imshow(corr)
# Side color bar
plt.colorbar()
# X axis and Y axis column names
plt.xticks(range(len(corr)),corr.columns, rotation=20)
plt.yticks(range(len(corr)),corr.columns)
plt.show()

Pie plot

A pie plot is used to represent the contribution of various categories to the total.

Code: Create a pie plot with explode, autopct, startangle and equal arguments.

# Create Data Set for Pie plot
import pandas as pd
Emp = [‘Jane’,’Johny’,’Boby’,’Jon’,’Mary’]
Salary = [9500,7800,7600,9500,7700]
SalaryList = zip(Emp,Salary)
df_sal = pd.DataFrame(data = SalaryList,columns=[‘Emp’, ‘Salary’])
df_sal

# The explode is used to be exploded outwards.
# The autopct argument depicts the number of decimal points to be shown in the percentage data points.
# The startangle argument specifies the angle at which the first slice should start, and it goes anticlockwise to represent all other slices in the pie chart.
# plt.axis(equal) indicates that the chart should be shown in a circle (equal x and y axes)

plt.pie(df_sal[‘Salary’], labels=df_sal[‘Emp’], explode=(0,0,0,0,0.15), startangle=90, autopct=’%1.1f%%’)
plt.axis(‘equal’)

Multiple graphs on the same axes

Here we have multiple graphs, but we are going to plot in the same axes. We are using aggregated data frame for this visualization.

Code: Display multiple graphs on the same axes.

plt.plot(col1,col2,label=’sepal_length’,color=’purple’)
plt.plot(col1,col3,label=’sepal_width’,color=’orange’)

plt.xlabel(‘species’)
plt.ylabel(‘sepal_length and sepal_width’)
plt.title(‘Comparison’)
plt.legend(loc=’best’)
plt.show()

One Figure, Multiple Subplots

This is one of the most interesting visualizations. In the same plotting area (figure), there will have multiple graphs which are known as subplots. The syntax for subplot issubplot(nrows, ncols, index)

Code: Display subplots in one figure.

fig = plt.figure(figsize=(10, 10))
plt.suptitle(‘Mupliple Plots — One Figure’)

#Declare instances of Subplots
ax1 = plt.subplot(221)
ax2 = plt.subplot(222)
ax3 = plt.subplot(223)
ax4 = plt.subplot(224)

# Define subplot1
ax1.plot(col1,col2,color=’purple’)
ax2.set_xlabel(“species”)
ax2.set_ylabel(“sepal_length”)
ax2.set_title(“species — sepal_length “)

# Define subplot2
ax2.scatter(fld4,fld5,c=’Orange’)
ax2.set_xlabel(‘petal_length’)
ax2.set_ylabel(‘petal_width’)
ax2.set_title(‘Analysis’)

# Define subplot3
ax3.pie(df_sal[‘Salary’], labels=df_sal[‘Emp’], explode=(0,0,0,0,0.15), startangle=90, autopct=’%1.1f%%’)
ax3.axis(‘equal’)
ax3.set_title(“Salary Analysis”)

# Define subplot4
ax4.violinplot(data, showmeans=True)
ax4.set_title(“Distribution”)

plt.show()

Conclusion:

In this blog, we learn how to do Python coding for basic data visualization with the Matplotlib package library.

If you want to know about it, please refer to Matplotlib 3.0 Cookbook by Srinivasa Rao Poladi, https://matplotlib.org/