Descriptive Statistics with Python: Type of Data & Level of Measurement

Introduction: To learn about data science or machine learning , it is basic necessity to know about statistics. Nowadays if anybody is attending data science or data analyst interview , it is obvious to face some questions related to statistics. If someone is new to data science world, it is very difficult for him/her to relate statistical application using python code.

In this blog I wrote python code with key notes related to descriptive statistics.

What is Statistics? To know about , we first know about to term . Population and Sample. Population is total data set collected for analysis , denoted as N. Sample is subset of population, denoted as n. Statistics is mathematical analysis and representation about this sample data. Parameter is the mathematical representation of population data.

For example, one company wants to conduct a survey on employee satisfaction for the entire company. You were tasked with contacting your project members about their opinion and then submitting them to the HR manager. Is it population or sample data? What should be the name of this presented value?

Answer: It is Sample data and presented value is called Statistics. Because you took only one project members’ data which is a small part of whole company data.

Types of Data Now we have to understand how many types of data are there. There are two types of data, categorical and numerical. Numerical data is divided into discrete and continuous.

This image has an empty alt attribute; its file name is image-2.png

Now let’s start with python coding.

Declare all required packages in jupyter notebook

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from matplotlib.ticker import PercentFormatter

# Type of Data : Categorical and Numerical
cate_list=["Apple","Banana","Orange"]
cate_list
Output: ['Apple', 'Banana', 'Orange']
cate_dict={1:"Apple",2:"Banana",3:"Orange"}
cate_dict
Output: {1: 'Apple', 2: 'Banana', 3: 'Orange'}
num_list=[1,2,6,7,4]
num_list
Output: [1, 2, 6, 7, 4]
print(np.random.rand(5))
print(np.random.randint(1,16,10))
# Type of Numerical data : Discrete and Continuous. 
# Age is discrete data, Month is continuous data. 
# Convert dictionary to data frame 
data={'Age': [20,25,20,35,40,45,50,55,60,65,70,75],'Month':[1,2,3,4,5,6,7,8,9,10,11,12]} 
df1=pd.DataFrame.from_dict(data) 
df1

Levels of Measurement There are two types of levels of measurement, Qualitative and Quantitative. Two qualitative levels: nominal and ordinal. There are two quantitative levels: interval and ratio.

# Nominal Data
nominal_dict={'Gender': ['Female','Male'],'Hair_Color': ['Black','White']}
df2=pd.DataFrame.from_dict(nominal_dict)
df2
Output:
     Gender  Hair_Color
 0    Female  Black
 1    Male    White

# Ordinal Data 
ordinal_dict={'Rating': ['Satified','Avg Satisfied','Not Satisfied']} 
df3=pd.DataFrame.from_dict(ordinal_dict) 
df3
Output:
       Rating
 0    Satified
 1    Avg Satisfied
 2    Not Satisfied
#Interval Data
interval_data={'Income':[25000,30000,40000]}
df4=pd.DataFrame.from_dict(interval_data)
df4
Output:
      Income
 0    25000
 1    30000
 2    40000
# Ratio Data : measurement of heights. 
ratio_data={'Height':[160,167,170]} 
df5=pd.DataFrame.from_dict(ratio_data) 
df5
Output:
    Height
 0    160
 1    167
 2    170

Conclusion : Till now we talked about different types of variables, in next blog, will discuss about Central Tendency (end part of descriptive statistics) with Pythons.

Descriptive Statistics with Python

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: