Seaborn Visualization for Beginners
Wondering how to create data visualizations using Python but not sure how? Maybe you’re not sure what library can generate the type of chart you want.
Hi, I’m Da Data Guy! Before I begin, if you are interested in Power BI, Python, Data visualization and SQL, please give me a follow on Medium and on my LinkedIn. I focus on writing quality articles breaking down each step in the process so you can follow along and learn too. At the bottom of this article will be my Github link and the embedded Python notebooks for you to use.
To follow along, I’m using data from Maven Analytics called “Pizza Place Sales”. This is part three of three where I provide a step-by-step tutorial of my Exploratory Data Analysis (EDA) using SQL, Python and Power BI. For this article, I’ll be using Seaborn and Python to create data visualizations that I found valuable for my analysis. If you are interested in the other two articles of this analysis, I’ll link them here:
Part 1: Optimizing SQL Temp Tables Using This Trick?
Part 2: Using Python To Transform Data From Multiple CSV Files
To start, I import several libraries into my Jupyter Notebook, set the Pandas library to display the maximum number of columns, and change the percent format into the common (0.00%) format. I prefer to see all the columns rather than have Pandas condense it. I also preset the Seaborn settings for each figure size, the color palette and more.
At the same time, I start the timer to time how long it takes to load my four CSV files. I always use this method because sometimes the CSV files are very large and it’s important to know how long it takes to import.
Now that everything has been loaded into the notebook, I check the columns and datatypes using:
df.info()
I notice that my date column is an “object”; I will need to change that to a datetime n64. I also notice that the size column will need to be changed to a “category” datatype and set the size order so the charts will respond accordingly.
# Changing size into a category
df['size']=df['size'].astype('category')
# set and order categories for size
df["size"] = df["size"].cat.set_categories(["S", "M", "L", "XL","XXL"])# This does not sort by the quantity within each size, rather by the size order I created above.
# Sort must be set to False.
df['size'].value_counts(sort = False)# OutputS 14137
M 15385
L 18526
XL 544
XXL 28
Name: size, dtype: int64
Now that I’ve correctly changed the data type into category and set the order for the size column, I’ll work on changing the date column into a date datatype.
# Taking date and time and joining them into 1 category
df['date'] = df[['date', 'time']].agg(' '.join, axis = 1)# Converting date from object into datetime dtype
df['date'] = pd.to_datetime(df['date'])# Represent month in date field as its first day
df['date'] = df['date'].dt.year.astype('str') + '-' + df['date'].dt.month.astype('str') + '-01'
# Converting back into datetime
df['date'] = pd.to_datetime(df['date'])
This date conversion method is unusual but effective. I joined the date and time columns into the date column, then converted it into a datetime format.
I then chose to change all dates within each month to the first day of the month. This makes it easier to place the data into the charts for this entry level analysis. If you wanted, you could create a new data frame that has the unique dates and the other is grouped by month.
Creating New Data Frames to Chart
For each new data frame, I’m specifically picking features that I find important.
For this data frame, I’m counting the number of transactions for each month.
# Groupby date and count of transactions into a new data frame
ta = df.groupby('date').order_details_id.count().reset_index()
# Renaming the column
ta=ta.rename(columns={'order_details_id':'transactioncount'})
ta.head(2)
I then create another data frame that counts the number of pizzas sold for each month.
# Groupby date and sum of pizzas sold into a new dataframe
quantitysold = df.groupby('date').quantity.sum().reset_index()
quantitysold.head(2)
Finally, I create a temporary data frame called temp1 that counts the number of pizzas, orders by the pizza size and the month. This data frame is tricky to create but when you break down the code, you can see how it works.
As a note, when you use the backslash “\”, this allows you to continue the python code to the next line break without generating an error.
# Creating a new dataframetemp1 = df.groupby([df['date'],'size']).agg(pizzasizecount= ('quantity','sum'), ordercount=('order_details_id','count'))\.reset_index().sort_values(by=['date','size'],ascending=True)# Reviewing the top two rows
temp1.head(2)
Joining All of The Data Frames Created Above Into One Data Frame
Lastly, I merge the data frames created above into one data frame to generate my charts from. This makes creating the charts faster and the script runs faster as the data frame only contains what is needed.
# Merging the two dataframes created above into one dataframe.
df2 = ta.merge(quantitysold, how = 'left', on = 'date')# Merging the temp1 dataframe with the new df2 dataframe.
df2 = df2.merge(temp1, how = 'left', on = 'date')# Reviewing the head of the two dataframes.
df2.head(2)
Next, I’ll reorder the columns and rename one of them.
# Reordering of the columns
df2 = df2[['date','quantity','transactioncount','size','pizzasizecount','ordercount']]# Renaming the column
df2 = df2.rename(columns={'quantity': 'quantitysold'})
Creating Chart Visualizations
With Seaborn, there are many ways you can create charts or modifying multiple charts at the same time. For my examples, I’ve kept them simple and clear. As you become more comfortable, you will find your own method. Like all things with Python, you can chain things together to make mass updates or modifications.
To start, all seaborn charts will start with its abbreviation, which is typically “sns”. Next, I type the chart I want and for my first chart, I selected a line plot. Next, I specify my data frame and the two columns I want for the X and Y axis.
# Creating the line plot and the size that works bestplt.figure(figsize=(12,5))
sns.lineplot(data=df2, x=df2["date"].dt.month_name(), y="transactioncount",marker='o',markersize=10)\
.set_title("Transaction Count by Month", fontsize=15)
As a best practice tip, I always write out the X and Y axis labels.
# Setting x and y labels
plt.margins(x=.05, y=.05) #changes width and height margins
plt.xlabel ("Month", labelpad=10)
plt.ylabel ("Transaction Count", labelpad=10)
Multiple Bar Chart Visualization
For the multiple bar chart, I will use the same template as above except this is called a bar plot. I also use a built-in function called “hue” which allows the chart to plot each unique value separately.
# Creating the bar plot and the size that works bestplt.figure(figsize=(12,5))
g = sns.barplot(y =df2['ordercount'], x=
df2["date"].dt.month_name(), hue = df2['size'])
As a best practice tip, I always write out the X and Y axis labels and for this chart, I also included a legend that aligns with each bar.
# Setting up the plot title and legend spacingg.margins(x=0.009, y=.05)
g.set_title("Pizza Sales Count by Month", fontsize=15)
g.set_xlabel ("Months", labelpad=10)
g.set_ylabel ("Pizza Sale Count", labelpad=10)
plt.legend(bbox_to_anchor=(1.1, 1), loc='upper right', borderaxespad=0)
Creating New Data Frames to View the Product Category Data
For these new data frames, I will take the category name and count how many orders are within each of the categories.
# sales by date
sm=df.groupby(df['category']).agg({'order_id':'count'}).reset_index().sort_values(by='category',ascending=True)
sm
Next, I create a new data frame that counts the number of orders by category by date.
# category sales by date
# need the \ after agg argument to continue with the larger function
dc = df.groupby([df['date'],'category']).agg(categorysum= ('quantity','sum'))\
.reset_index().sort_values(by=['date','category'],ascending=True)
dc['date'] = pd.to_datetime(dc['date']) #change column to a datetime n64dc.head(2)
Creating New Visualizations Based on The Two New Data Frames
The code below shows how to create a pie chart that as the category name labeled on the outside, the percentage on the inside and exploded to help your eyes see the slight difference between them all.
# Creating a pie chart to visualize the percentage different between the categoriesplt.pie(data = sm, x = 'order_id',labels = 'category',autopct='%.0f%%',explode = (0.1, 0.1, 0.1, 0.1))
Finally, my last chart displays the pizza sales by month and category. To do this, I use the new data frame that I created above and set my hue to ‘category’.
# Creating a multiple line chart to visualize the sales of pizza by pizza categoryplt.figure(figsize=(12,5))
sns.lineplot(data=dc, x=dc["date"], y="categorysum",marker='o',markersize=10, hue = 'category')\
.set_title('Category of Pizza Sales by Month')
#setting x and y labels
plt.margins(x=.05, y=.05) #changes width and height margins
plt.xlabel ("Month", labelpad=10)
plt.ylabel ("Categories Ordered", labelpad=10)
plt.legend(bbox_to_anchor=(1.13, 1), loc='upper right', borderaxespad=0)
Summary
This completes my basic introduction for Seaborn and how to create multiple types of charts using specific data frames. Not all charts will require a new data frame but often I’ve found it helpful. Remember, everyone has a preference and every situation is different. Don’t get discouraged because any process is better than no process.
Don’t forget to follow me and if you’re interested in learning more about bar charts and how to improve them, check it out — Improve Your Bar Chart Visuals With These Five Tips | by Da Data Guy | Medium
All of my code and CSV files can be found on my Github, click here — PizzaSalesEDA/Python at main · DaDataGuy/PizzaSalesEDA (github.com)