# Descriptive Statistics with Python: Type of Data & Level of Measurement

Introduction: To learn about data science or machine learning , it is basic necessity to know about statistics. Nowadays if anybody is attending data science or data analyst interview , it is obvious to face some questions related to statistics. If someone is new to data science world, it is very difficult for him/her to relate statistical application using python code.

In this blog I wrote python code with key notes related to descriptive statistics.

What is Statistics? To know about , we first know about to term . Population and Sample. Population is total data set collected for analysis , denoted as N. Sample is subset of population, denoted as n. Statistics is mathematical analysis and representation about this sample data. Parameter is the mathematical representation of population data.

For example, one company wants to conduct a survey on employee satisfaction for the entire company. You were tasked with contacting your project members about their opinion and then submitting them to the HR manager. Is it population or sample data? What should be the name of this presented value?

Answer: It is Sample data and presented value is called Statistics. Because you took only one project members’ data which is a small part of whole company data.

Types of Data Now we have to understand how many types of data are there. There are two types of data, categorical and numerical. Numerical data is divided into discrete and continuous.

Declare all required packages in jupyter notebook

```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from matplotlib.ticker import PercentFormatter

# Type of Data : Categorical and Numerical
cate_list=["Apple","Banana","Orange"]
cate_list
Output: ['Apple', 'Banana', 'Orange']
cate_dict={1:"Apple",2:"Banana",3:"Orange"}
cate_dict
Output: {1: 'Apple', 2: 'Banana', 3: 'Orange'}
num_list=[1,2,6,7,4]
num_list
Output: [1, 2, 6, 7, 4]
print(np.random.rand(5))
print(np.random.randint(1,16,10))
# Type of Numerical data : Discrete and Continuous.
# Age is discrete data, Month is continuous data.
# Convert dictionary to data frame
data={'Age': [20,25,20,35,40,45,50,55,60,65,70,75],'Month':[1,2,3,4,5,6,7,8,9,10,11,12]}
df1=pd.DataFrame.from_dict(data)
df1```

Levels of Measurement There are two types of levels of measurement, Qualitative and Quantitative. Two qualitative levels: nominal and ordinal. There are two quantitative levels: interval and ratio.

```# Nominal Data
nominal_dict={'Gender': ['Female','Male'],'Hair_Color': ['Black','White']}
df2=pd.DataFrame.from_dict(nominal_dict)
df2
Output:
`Gender  Hair_Color`
0    Female  Black
1    Male    White

# Ordinal Data
ordinal_dict={'Rating': ['Satified','Avg Satisfied','Not Satisfied']}
df3=pd.DataFrame.from_dict(ordinal_dict)
df3
Output:
`Rating`
0    Satified
1    Avg Satisfied
2    Not Satisfied```
```#Interval Data
interval_data={'Income':[25000,30000,40000]}
df4=pd.DataFrame.from_dict(interval_data)
df4
Output:
`      Income`
0    25000
1    30000
2    40000
# Ratio Data : measurement of heights.
ratio_data={'Height':[160,167,170]}
df5=pd.DataFrame.from_dict(ratio_data)
df5
Output:
`Height`
0    160
1    167
2    170```

Conclusion : Till now we talked about different types of variables, in next blog, will discuss about Central Tendency (end part of descriptive statistics) with Pythons.