For any data analysis project, we are following CRISP-DM (Cross-industry standard process for data mining) framework.
In this framework, data profiling is one of the key steps to start with the project.
In previous blogs of this series (Data Analysis in Power BI), we get an idea about the Power BI tool and how to import data in this tool using various data source connectivity.
In this blog, we are going to understand how to describe the data set after importing.
What is Data Profiling?
- For any Data Analysis project, data profiling is the key step to understand the overall structure of the data.
- Helps to describe the data consistency
- For limitation in the data source, you can look for a better data source
- Without Data Profiling, results will not be in a proper presentable format.
I consider the US Superstore dataset from Kaggle.
- Let’s start with the Get Data option under the Home tab. As this is a CSV file, select the Text/CSV option from the drop-down list
- Select the file named US Superstore data.csv
- After selecting the file, data will be displayed in the below format
- Click on Load and save data.
Identify Data Inconsistency
After importing data, you can proceed with the first step of data profiling.
How you can identify data anomalies in your data set, find the steps in the below short video.
Analyze Data Distribution
Now it’s time to find out how data is distributed across all the columns.
Examines Data Statistics
Power BI helps to extract the statistical information from a data set with a single click check box.
In this blog, you learn about below things
- What is Data Profiling
- Identify Data Inconsistency
- Analyze Data Distribution
- Examines Data Statistics
In my next blog, we will learn more in details.
If you have any questions related to this project, please feel free to post your comments.
Please visit my website for other technical resources.
Please like, comment and subscribe to my YouTube channel which you have already seen. 🙂 Keep Learning.