# Learn Data Analysis with Python: Find out the Practical Code for Data Interpretation

Step by step Python Code for data understanding (statistical analysis, use of pivot table, data sorting, etc.)

# Introduction

If we want to apply for any data analyst or data scientist role, it is necessary to know one of the programming languages used for such roles. It could be R or Python or Scala etc. To fulfill this, I have selected Python for data analysis.

If you want to check the practical code data loading step from different sources and data cleaning steps. Please check the below links.

https://arpitatechcorner.wordpress.com/2021/02/23/learn-data-analysis-with-python-find-out-the-practical-code-for-data-cleaning/

After data cleaning, we are now in the actual data analysis stage. In this phase, we can perform different data analysis operations like statistical analysis, data aggregation using pivot table operation, etc.

# Statistical Analysis

To do descriptive statistical analysis, we can use describe command of the panda and get the detailed summary information. But using different aggregated functions, we can find out the results at the individual measure level.

# Creating Dataset
import pandas as pd
Emp = [‘Jane’,’Johny’,’Boby’,’Jon’,’Mary’,’Jony’,’Alice’,’Melica’]
Salary = [9500,7800,7600,9500,7700,7800,9900,10000]
SalaryList = zip(Emp,Salary)
df = pd.DataFrame(data = SalaryList,columns=[‘Emp’, ‘Salary’])
df

df[‘Salary’].count() # number of values
df[‘Salary’].mean() # arithmetic average
df[‘Salary’].std() # standard deviation
df[‘Salary’].min() # minimum
df[‘Salary’].max() # maximum
df[‘Salary’].quantile(.25) # first quartile
df[‘Salary’].quantile(.5) # second quartile
df[‘Salary’].quantile(.75) # third quartile
df[‘Salary’].median() # the middle value if they are sorted in order
df[‘Salary’].mode() #the most common values
df[‘Salary’].var()# computes the variance of the values in a column
df.var() # Computing Variance on All Numeric Columns

https://arpitatechcorner.wordpress.com/2020/11/23/descriptive-statistics-with-python-part-1/

https://arpitatechcorner.wordpress.com/2021/01/17/how-to-calculate-central-tendency-and-asymmetry-measures-in-statistics-and-python/

https://arpitatechcorner.wordpress.com/2021/02/23/how-to-calculate-variability-measures-variance-sd-etc-in-statistics-and-python/

# Sorting Data

Sometimes we need to rearrange the data. To do this, we can use the sorting features of python.

# Sorting by Salary Descending
df = df.sort_values(by=’Salary’, ascending=0)

# Sorting by Salary,Emp Ascending
df = df.sort_values(by=[‘Salary’, ‘Emp’],ascending=[True, True])

# Data Interpretation using Pivot Table

We know the pivot table option is helping to reform the data analysis world. Let’s find out some meaning of data using the python pivot table feature.

# Create Data frame
import pandas as pd

Code: Get Averages of All Numeric Columns Categorized by Gender

pd.pivot_table(df, index=[‘gender’])

Code: Average Salary by Gender. By default aggregate function is average

pd.pivot_table(df, values=[‘salary’],index=[‘gender’])

pd.pivot_table(df, values=[‘salary’],index=[‘gender’], aggfunc=’min’)

Code: Max Grade by Gender and Age. When we use two categorical fields

pd.pivot_table(df, index=[‘gender’,’age’], aggfunc=’max’, values=[‘salary’])

Code: Average Salary and Bonus by Gender

pd.pivot_table(df, index=[‘gender’], aggfunc=’mean’, values=[‘salary’,’bonus’])

Code: Average Salary and Bonus by Gender: Adding Filter condition

df2 = df.loc[df[‘age’] >45]
pd.pivot_table(df2, index=[‘gender’], aggfunc=’mean’, values=[‘salary’,’bonus’])

# Conclusion:

In this blog, we learn how to do Python coding for data interpretation purposes. If you have any questions, please post them in the comment section.