Learn Data Analysis with Python: Find out the practical code for data loading and saving.

Step by step Python Code for data loading and saving using different sources.

Introduction

If we want to apply for any data analyst or data scientist role, it is necessary to know one of the programming languages used for such roles. It could be R or Python or Scala etc. To fulfill this, I have selected Python for data analysis. This document will have information about how data loading and saving is possible with Python.

With Python libraries like pandas, we can import and export data in different formats, different sources, etc. I am using some common and useful formats. For example CSV, Excel, and SQL

CSV File

This is the most common file format. Let us find out the code. I have taken data from Kaggle.

wine-price-ratingupdated wine ratings and price data to assist making better purchase decisionswww.kaggle.com

Loading / Import

The above file has a proper header which means column name. Then execute the following code.

import pandas as pd
df = pd.read_csv(“white-wine-price-rating.csv”)
df.head()

If the column header is not present, then use the below code.

import pandas as pd
df = pd.read_csv(“white-wine-price-rating_no_Header.csv”, header=None)
df.head()

Saving / Export

If we want to export the data frame to a csv file, then find below the code for saving data to csv file.

import pandas as pd
empnames = [‘Ala’,’Alice’,’Edward’,’Kevin’,’Nathan’]
empid= [76,95,77,78,99]
EmpList = zip(empnames,empid)
df_1 = pd.DataFrame(data = EmpList, columns=[‘Names’,’ID’])
df_1.to_csv(‘Employee_with_header.csv’,index=False,header=True)

EXCEL File

Let’s see how an excel file is used as the data source in Python.

Loading / Import

Link for Kaggle data set which is used here.

Iris DatasetIris Dataset in all (kind of) Data Formatswww.kaggle.com

import pandas as pd
df = pd.read_excel(“iris.xlsx”)
df.head()

Saving / Export

When we are going to save data in excel format, there are many ways to save it. For example, saving excel with one sheet, saving with multiple sheets, and merging data from multiple excel files. Let us find out the python coding to achieve these.

Code: Saving in Excel format with a single sheet

import pandas as pd
brandnames = [‘Nike’,’Adidas’,’HM’,’Puma’,’Reebok’]
prices = [150,64,45,35,99]
PriceList = zip(brandnames,prices)
df = pd.DataFrame(data = PriceList, columns=[‘BrandNames’,’Prices’])
writer=pd.ExcelWriter(‘ProductPrice.xlsx’,engine=”xlsxwriter”)
df.to_excel(writer,sheet_name=”Sheet1″)
writer.save()

Code: Saving in Excel format with more than one sheet

import pandas as pd
# Sheet 1
brandnames = [‘Nike’,’Adidas’,’HM’,’Puma’,’Reebok’]
prices = [150,64,45,35,99]
PriceList = zip(brandnames,prices)
df = pd.DataFrame(data = PriceList, columns=[‘BrandNames’,’Prices’])
# Sheet 2
empnames = [‘Ala’,’Alice’,’Edward’,’Kevin’,’Nathan’]
empid= [76,95,77,78,99]
EmpList = zip(empnames,empid)
df2 = pd.DataFrame(data = EmpList, columns=[‘Names’,’ID’])
# Saving Data to Excel File
writer = pd.ExcelWriter(‘dataframe.xlsx’, engine=”xlsxwriter”)
df.to_excel(writer, sheet_name=”Sheet1″)
df2.to_excel(writer, sheet_name=”Sheet2″)
writer.save()

Code: Merging data from Multiple Excel files

If we have multiple excel files with the same column header but different data set and for analysis, we need to merge those files, then we can follow below sample code. Here file names are started with “data”. For example Adata1, Adata2, and so on.

Image for post
Image by Author

# Combining Data from Multiple Excel Files
import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob(“Adata*.xlsx”):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
#all_data.describe()
all_data.head(6)

SQL Database

We have observed how CSV and Excel files are used for data loading and saving. Now it is time to learn how we can connect with SQL databases as data sources. Here I am using MS SQL Server. You can try any other database tool and let me know the differences.

To start with, install the package for the ODBC connection.

pip install pyodbc

Now import the package and set the connection pool.

import pyodbc
# Server = Database server name , Database = Database name, Driver= Database tool name
conn=pyodbc.connect(‘Driver={SQL Server};’
‘Server=DESKTOP-5PILMM6;’
‘Database=Movies;’
‘Trusted_Connection=yes;’)

Loading / Import

Code: Execute Select SQL query to fetch the data.

cursor = conn.cursor()
cursor.execute(‘SELECT * FROM Movies.dbo.Actor’)
for row in cursor:
print(row)

Image for post
Image by Author

Saving / Export

This is more interesting when we can create one table and insert some data into that table. Let’s see how we can achieve this.

One assumption is that we are aware of the database name where the table will be created. So I am considering the above connection pool which is active.

#Create table testEmp with ID integer field in Movies database
cursor=conn.cursor()
cursor.execute(‘Create table Movies.dbo.testEmp(ID int)’)
conn.commit()

# Insert 2 records into testEmp table
cursor.execute(‘Insert into Movies.dbo.testEmp(ID) values (1),(2)’)
conn.commit()

Now I am logging into the SQL Server database and looking for the newly created table as well as selecting the records.

Image for post
Image by Author

Conclusion:

In this blog, we learn how to do Python coding for data loading and saving purpose. If you have any questions, please post them in the comment section.

3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: