Data Analysis on Student Performance using MS Power BI and Python

Image from https://unsplash.com/

In data analysis world, there are plenty of opportunities to make integration among different technologies.  In that context, I tried to explore the integration feature between Power BI and Python. In this blog, I am capturing my learning steps.

Data Set:

 To start with our own data analysis project, it is the best option to take the data from Kaggle.  Here is the data link https://www.kaggle.com/spscientist/students-performance-in-exams

Abstraction:

For any project whether it is small or big, my process is to get some idea about the business domain.  Here domain is not very complicated; it is based on students’ performance in examination.  

After analyzing the data file, here are some noted points

  • Three different type to scores
  • Gender , group category 
  • Parents education level
  • Test preparation

Data Analysis with Power BI and Python

There are two methods in Power BI to integrate with Python.

  1. We can use Python script (.py) data source connection process  and any Power BI visual options including Python Visual
  2. We can directly import the data (csv. Xls etc) in Power BI and use Python visual

Option 1: Data Source = Python Script

To implement the option 1 I had followed the below steps.

  1. Install Python in the system.
  2. Install Python packages which are required for this project.
    1.  numpy ,
    1. pandas,
    1. seaborn ,
    1. matplotlib
  3. If Power BI is already installed then go to File – > Options and Settings – > Options -> Python Scripting. Now change in two places, one for python installation directories and another for python IDE. Save the settings and now ready for integration.
How to integrate Python with Python
  • Click on Get Data -> Other – > Python Script, paste python script for data import.
How to integrate Python with Python
How to integrate Python with Python
How to integrate Python with Python
  • Tables are available for report creation
  • I had created some reports  with some basic visuals
    • Consider aggregation rule Avg instead of Sum
    • Create reports based on gender, ethnicity, math score, writing score, reading score, parental  level of education,  test preparation
How to integrate Python with Python
  • Publish the page to Power BI service
How to integrate Python with Python

Option 2.1: Data Source = Python Script, Visual = Python

  1. Here data source is Python Script, but I used Python as visual.
  2. Click on “Python visual” under Visualizations. 
  3. For this type of report, we need to select one field and then Python script editor is available for modification.
How to integrate Python with Python
  • I had created four reports.  Attaching codes for reference.
#Report1
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("E:/For GitHub/StudentsPerformance/StudentsPerformance.csv")
Set theme
sns.set_style('whitegrid')
Univariate Analysis for math score
sns.distplot(df['math score'], bins=10);
plt.show()
#Report2
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("E:/For GitHub/StudentsPerformance/StudentsPerformance.csv")
Set theme
sns.set_style('whitegrid')
Univariate Analysis for reading score
sns.distplot(df['reading score'], bins=10,color='green' );
plt.show()
#Report3
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("E:/For GitHub/StudentsPerformance/StudentsPerformance.csv")
Set theme
sns.set_style('whitegrid')
Univariate Analysis for writing score
sns.distplot(df['writing score'], bins=10, color='purple' );
plt.show()
#Report4
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("E:/For GitHub/StudentsPerformance/StudentsPerformance.csv")
Set theme
sns.set_style('whitegrid')
sns.pairplot(df, hue="gender", height=2.5);
plt.show()
How to integrate Python with Python

Option 2.2: Data Source = csv, excel etc, Visual = Python

In this context, we will follow the same visual logic. Here only difference is  data source.

Troubleshooting

I had faced some challenges during this project.

Power BI Desktop: Please note that latest Python version is not fully compatible with Power BI. For example there are version restrictions for different python packages. If those versions are not supported, there will be so many error messages.  To avoid such kind of error messages, please check all the supported versions from the below link

https://docs.microsoft.com/en-us/power-bi/connect-data/service-python-packages-support#:~:text=Current%20Python%20runtime%3A%20Python%203.7,%2C%20MIT%2B%2C%20and%20so%20on.

Power BI Service: To see these python reports in Power BI Service, we need licensed id.

YouTube

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: