Optimizing SQL Temp Tables Using This Trick?
We’ve all wrote SQL queries where we either can’t use a windows function, such as COUNT, inside of a WHERE clause or using too many joins and your script is bloated. In this article, I’ll show you how I work around this issue while keeping my queries optimized.
Hi, I’m the Da Data Guy! Before I begin, if you are interested in Power BI, Python, Data visualization and SQL, please give me a follow. I focus on writing quality articles breaking down each step in the process so you can follow along and learn too. At the bottom of this article will be links and embedded SQL queries for you to use or review.
To follow along, I’m using data from Maven Analytics called “Pizza Place Sales”. This will be a three-part series where I provide a step-by-step tutorial of my Exploratory Data Analysis (EDA) using SQL, Python and Power BI. For this article, I’ll be using SQL to explain how I use temp tables to quickly gain information across the four Pizza Place Sales CSV files.
Creating a Pizza Sales Database
For this EDA, I’ll be using Microsoft SQL Server Management Studio (SSMS) and manually uploading the files into four tables. To start:
- Open SSMS and connect to your local server. My server is local on my computer.
- Right click on databases and select “New Database…” You will then be prompted to name the database, which I’ve named mine “Pizza Sales”.
3. Assuming you have downloaded the four CSV files from the link above, we will now upload them manually into the database you had created. Right click your Pizza Sales database and hover over tasks and then select import flat file.
4. This import wizard will guide you through the process on how to import your four CSV files. All of them will need to be done individually and they are essentially the same process for each.
5. When you reach the section where you can select the datatype for each column, you can modify as needed. I selected int for order IDs, small int for quantities, and unique values, such Order_Details_ID as my primary key for example.
6. Repeat step 4 and 5 until you have all 4 of the CSV files loaded. I opted to name them the same as they were originally labeled.
7. You may need to restart (close and open) your SSMS application for the new tables to be populated and for your intellisense to recognize the new columns.
Now that you’ve successfully created a database and uploaded the four files into the database as tables, it’s time we start writing a few SQL queries and temp tables!
Review Imported Data
To start off, select the top 1000 rows from the order_details table. Right click on the dbo.order_details and select top 1000 rows.
SELECT TOP (1000) [order_details_id]
,[order_id]
,[pizza_id]
,[quantity]FROM [Pizza Sales].[dbo].[order_details]
The results show us that everything has been added correctly and we can start writing our temp tables.
Creating a Local Temp Table
We are finally ready to start creating local temp tables. This first example will be creating a temp table that stores the unique dates of all orders placed. As a habit, I always start off every SQL query using “USE” and then the name of the database.
USE [Pizza Sales]
Next, I will add the IF OBJECT_ID statement to verify if the object exists or not. Within this statement, I will create a temp table and label it is using one hashtag sign, as it’s my local temp table. Using two hashtags means it's a global temp table. Notice that if the table is not null, it will drop the table.
This is a quick way to update your temp tables without manually dropping them.
IF OBJECT_ID('tempdb..#datetable') IS NOT NULLDROP TABLE #datetable
For the temp table select statement, I will only select [date] and insert it into the temp table.
SELECT DISTINCT
[date]
INTO #datetable
FROM [Pizza Sales].[dbo].[orders]
If you would like to test this query, go ahead and see if SSMS provides a message stating it’s updated 358 rows. If so, then you have done it successfully.
Lastly, I will select the dates from the temp table using the following query.
-------------------------------------------------
/* Selecting a specific count of pizzas*/
-------------------------------------------------
SELECT
[date]
FROM #datetable
ORDER BY [Date]
You should get 358 rows displaying all of the available dates inside the temp table.
Adding a Column and Using Window Functions in a Temp Table
To increase the complexity of the temp table, I’ll use a few window functions to perform an exploratory data analysis on the Order_Details table.
- Update the temp table to include Order_Id.
USE [Pizza Sales]
/* Creating temp table to hold all unique dates */
IF OBJECT_ID('tempdb..#datetable') IS NOT NULL
DROP TABLE #datetableSELECT DISTINCT
[date]
,[Order_id]
INTO #datetable
FROM [Pizza Sales].[dbo].[orders]
2. For counting the number of pizzas sold by date, I used the Count Over Partition By statement that counts the number of pizza ids by the date.
Select DISTINCT
EOMONTH(d.[date]) as [Date]
, UPPER(od.pizza_id) as pizza_id
, COUNT(od.pizza_id) OVER (PARTITION BY EOMONTH(d.[date]), od.Pizza_id) as PizzaCountbyPizzaIDFROM [Pizza Sales].[dbo].[order_details] as od
LEFT JOIN #datetable as d on (od.order_id = d.order_id)WHERE d.[date] BETWEEN '2015-06-01' and '2015-06-30'
ORDER BY EOMONTH(d.[date]), pizza_id
Using Two Temp Tables to Find Pizza Count Greater Than 5
My final example will show you how to use two temp tables and join them together for a final select statement where the count of pizza sales is greater than 5. You might be wondering, why can’t I write
“WHERE pizza count by order ID is > 5". That is because we cannot use window functions in the WHERE clause. See below for the error message.
To work around this error, I need to store all of the pizza counts in their own temp table.
I start off by creating two temp tables, the queries are roughly the same as above. You can copy and paste what you have already written.
USE [Pizza Sales]
/* Creating temp table to hold all unique dates */
IF OBJECT_ID('tempdb..#datetable') IS NOT NULL
DROP TABLE #datetableSELECT DISTINCT
[date]
,[Order_id]
INTO #datetable
FROM [Pizza Sales].[dbo].[orders]
-------------------------------------------------
/* Creating temp table to hold all pizza counts */
-------------------------------------------------
IF OBJECT_ID('tempdb..#pizzacount') IS NOT NULL
DROP TABLE #pizzacountSELECT DISTINCT
d.[date]
, od.order_id
, COUNT(od.pizza_id) OVER (PARTITION BY od.Order_id, EOMONTH(d.[date])) as PizzaCountbyOrderID
INTO #pizzacountFROM [Pizza Sales].[dbo].[order_details] as od
LEFT JOIN #datetable as d on (od.order_id = d.order_id)WHERE d.[date] BETWEEN '2015-06-01' and '2015-06-30'
My next step will be to query the pizzzacount table where I’ve stored all of my pizza counts in.
SELECT *
FROM #pizzacount
WHERE PizzaCountbyOrderID > 5ORDER BY PizzaCountbyOrderID DESC, [date] desc
As you can see, the query works! I’ve successfully used two temp tables and filtered by those who have sold more than 5 pizzas a day
We have successfully created an optimized and easy to use temp table. If you’re like me, I was pretty happy to learn how to use this specific way of creating a temp table that updated itself. I learned this snippet of code from my data mentor. He encouraged me to share our code to help others in the field and not bottle up our knowledge.
If you’ve found this beneficial, please give me a follow and a few claps. It’s likely that if you found this to be helpful, someone else will too and sharing this article helps spread knowledge to those who seek it.
Link to GitHub Pizza Sales EDA repository.