Learn Data Analysis with Python: Find out the Practical Code for Data Interpretation

Step by step Python Code for data understanding (statistical analysis, use of pivot table, data sorting, etc.)

Image for post
Image from https://unsplash.com/

Introduction

If we want to apply for any data analyst or data scientist role, it is necessary to know one of the programming languages used for such roles. It could be R or Python or Scala etc. To fulfill this, I have selected Python for data analysis.

If you want to check the practical code data loading step from different sources and data cleaning steps. Please check the below links.

https://arpitatechcorner.wordpress.com/2021/02/09/learn-data-analysis-with-python-find-out-the-practical-code-for-data-loading-and-saving/

https://arpitatechcorner.wordpress.com/2021/02/23/learn-data-analysis-with-python-find-out-the-practical-code-for-data-cleaning/

After data cleaning, we are now in the actual data analysis stage. In this phase, we can perform different data analysis operations like statistical analysis, data aggregation using pivot table operation, etc.

Statistical Analysis

To do descriptive statistical analysis, we can use describe command of the panda and get the detailed summary information. But using different aggregated functions, we can find out the results at the individual measure level.

# Creating Dataset
import pandas as pd
Emp = [‘Jane’,’Johny’,’Boby’,’Jon’,’Mary’,’Jony’,’Alice’,’Melica’]
Salary = [9500,7800,7600,9500,7700,7800,9900,10000]
SalaryList = zip(Emp,Salary)
df = pd.DataFrame(data = SalaryList,columns=[‘Emp’, ‘Salary’])
df

df[‘Salary’].count() # number of values
df[‘Salary’].mean() # arithmetic average
df[‘Salary’].std() # standard deviation
df[‘Salary’].min() # minimum
df[‘Salary’].max() # maximum
df[‘Salary’].quantile(.25) # first quartile
df[‘Salary’].quantile(.5) # second quartile
df[‘Salary’].quantile(.75) # third quartile
df[‘Salary’].median() # the middle value if they are sorted in order
df[‘Salary’].mode() #the most common values
df[‘Salary’].var()# computes the variance of the values in a column
df.var() # Computing Variance on All Numeric Columns

If you want to know more about descriptive statistics, please have a look my blogs about this.

https://arpitatechcorner.wordpress.com/2020/11/23/descriptive-statistics-with-python-part-1/

https://arpitatechcorner.wordpress.com/2021/01/17/how-to-calculate-central-tendency-and-asymmetry-measures-in-statistics-and-python/

https://arpitatechcorner.wordpress.com/2021/02/23/how-to-calculate-variability-measures-variance-sd-etc-in-statistics-and-python/

Sorting Data

Sometimes we need to rearrange the data. To do this, we can use the sorting features of python.

# Sorting by Salary Descending
df = df.sort_values(by=’Salary’, ascending=0)
df.head(10)

# Sorting by Salary,Emp Ascending
df = df.sort_values(by=[‘Salary’, ‘Emp’],ascending=[True, True])
df.head(10)

Data Interpretation using Pivot Table

We know the pivot table option is helping to reform the data analysis world. Let’s find out some meaning of data using the python pivot table feature.

Image for post
salarydata.csv (Image by Author)

# Create Data frame
import pandas as pd
df = pd.read_csv(“salarydata.csv”)
df.head()

Code: Get Averages of All Numeric Columns Categorized by Gender

pd.pivot_table(df, index=[‘gender’])

Code: Average Salary by Gender. By default aggregate function is average

pd.pivot_table(df, values=[‘salary’],index=[‘gender’])

Code: Minimum Grade by Gender

pd.pivot_table(df, values=[‘salary’],index=[‘gender’], aggfunc=’min’)

Code: Max Grade by Gender and Age. When we use two categorical fields

pd.pivot_table(df, index=[‘gender’,’age’], aggfunc=’max’, values=[‘salary’])

Code: Average Salary and Bonus by Gender

pd.pivot_table(df, index=[‘gender’], aggfunc=’mean’, values=[‘salary’,’bonus’])

Code: Average Salary and Bonus by Gender: Adding Filter condition

df2 = df.loc[df[‘age’] >45]
pd.pivot_table(df2, index=[‘gender’], aggfunc=’mean’, values=[‘salary’,’bonus’])

Conclusion:

In this blog, we learn how to do Python coding for data interpretation purposes. If you have any questions, please post them in the comment section.

1 comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: