Introduction to Big Data

Dear friends , from my school days , I was always keeping notes based on my understanding about any topic and still continuing the same.

Before moving to Big Data Analytics , let’s get some idea about Data Analytics. Data Analytic is the process for analyzing data with qualitative and quantitative methods and achieving the goal of valuable insights.

We know two types of data Qualitative or Quantitative in nature.
Qualitative data is related with forms of information that is observable but not necessarily measurable. Eg: color, texture, smell, taste, appearance, beauty, etc.
Quantitative data is related to entities that deal with numbers or that can be measured. Eg: length, height, area, volume, weight, speed, time, temperature, humidity, cost of goods, ages of set of connected people, likes and pokes etc.

There are two concepts exploratory data analysis (EDA) and confirmatory data analysis (CDA).

Difference between EDA and CDA !!!

EDA is used to explore data and find patterns in the data and relationships between various elements of the data. CDA is used to provide an insight or conclusion for a specific question, based on hypothesis and statistical techniques, or simple observation of the data.

What is Big Data?

I feel nowadays almost everyone in IT industry is aware of this term. But in the Analytics world , how it is related. Global eCommerce companies , all famous social media companies , most of the companies run their business using data analytics and base many of the decisions on the analytics. So we can think about what kind of data they are collecting, how much data they might be collecting, and then how they might be using the data. Each item is related to Big Data.

Let’s understand some important features of Big Data. We call these as four Vs. These are the main factors to identify big data.

Volume: The size of the data
Velocity: The speed at which the data is generated
Variety: How the data comes in various formats such as text, CSV, pictures, videos, audio, log records, etc.
Veracity: The correctness/utility of the data

I found some more Vs about Big Data.
Variability of data means constantly changing.

Visualization of data means to present the data in a readable and accessible manner after it has been processed.

Big data is large and is increasing everyday, however the data is also messy, noisy, and constantly changing. It is available for all in a variety of formats and is in no position to be used without analysis and visualization.

Three types of data available in Big Data.

  • Structured: Data highly organized
  • Semi-Structured: Data does not relate to data models used in RDBMS
  • Unstructured:Data highly unorganized.

Data Measurement !!

The smallest unit of data measurement is byte which is made up of 8 bits. The higher units of measurement are kilobyte, megabyte, gigabyte, terabyte, petabyte and exabyte.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: