CommonLounge Archive

Writing Efficient and Dynamic Queries

July 04, 2018

Dynamic Queries

We have learned how to write queries to analyze data and perform simple calculations. In this section, we will take this a step further. We will use SQL functions to create custom reports based on our data and will learn how to analyze trends in our data.

Objectives

  • Use CASE statements to structure data and create new attributes.
  • Combine multiple subqueries into one using “AS.”
  • Temporary Tables and Subqueries.
  • Determining data trends through advanced reporting.

SQL Fiddle Recap

If you already remember how to use SQL Fiddle, skip this section.


SQL Fiddle is an online tool that let’s you run SQL Queries. It is a great tool to practice what we will learn in this article. Similar to database we will have tables in SQL and then write queries to extract data from those tables. The left panel is the Schema Panel where we will build table and add data to those tables. The right panel is where we will write SQL code to extract data from those tables and run calculations on those values. The bottom panel is where you will see the results of your query. Since we can’t run a query without tables, the right and lower panel will be grayed out until you build a schema aka create tables. Schema can also have other objects in addition to tables which we will cover later.

Loan Sales Table Summary

Assume you’re working for a bank’s division that sells loans. The database has information about all the loans that have been approved in the past. Below is an overview of the fields in the database:

Create Loan Sales Table

Copy and paste the following schema into the Schema panel and click on “Build Schema”. This will create the Loan Sales table to run all queries in this exercise.

CREATE TABLE Loan_Sales
    (`ID` int, `Funded_Amount` int, `Term` int, `Interest_Rate` decimal(6,4), `Grade` varchar(125), `Loan_Status` varchar(18), `State` varchar(2), `Employment_Length` varchar(9), `Home_Ownership` varchar(8), `Loan_Purpose` varchar(18), `Loan_Utilization` decimal(6,4))
;    
INSERT INTO Loan_Sales
    (`ID`, `Funded_Amount`, `Term`, `Interest_Rate`, `Grade`, `Loan_Status`, `State`, `Employment_Length`, `Home_Ownership`, `Loan_Purpose`, `Loan_Utilization`)
VALUES
    (1, 16000, 3, 0.0797, 'A', 'Current', 'TX', '10+ years', 'RENT', 'debt_consolidation', 0.473),
    (2, 4800, 3, 0.0532, 'A', 'Current', 'KY', '2 years', 'MORTGAGE', 'medical', 0.186),
    (3, 14875, 3, 0.0735, 'A', 'Current', 'MN', '2 years', 'OWN', 'credit_card', 0.32),
    (4, 9600, 3, 0.1199, 'B', 'Current', 'NC', '10+ years', 'MORTGAGE', 'debt_consolidation', 0.923),
    (5, 7800, 3, 0.1091, 'B', 'Current', 'MO', '< 1 year', 'RENT', 'debt_consolidation', 0.669),
    (6, 4000, 3, 0.2291, 'E', 'Current', 'NC', '5 years', 'RENT', 'debt_consolidation', 0.373),
    (7, 10000, 3, 0.0672, 'A', 'Current', 'PA', '2 years', 'MORTGAGE', 'small_business', 0.444),
    (8, 16000, 5, 0.1262, 'C', 'Current', 'CA', '7 years', 'MORTGAGE', 'debt_consolidation', 0.294),
    (9, 10000, 3, 0.1091, 'B', 'Current', 'MN', '10+ years', 'MORTGAGE', 'other', 0.726),
    (10, 3500, 3, 0.0797, 'A', 'Current', 'LA', '5 years', 'OWN', 'other', 0.278),
    (11, 20000, 5, 0.1262, 'C', 'Current', 'OH', '1 year', 'MORTGAGE', 'debt_consolidation', 0.614),
    (12, 3000, 3, 0.0672, 'A', 'Current', 'PA', 'n/a', 'RENT', 'debt_consolidation', 0.2),
    (13, 6000, 3, 0.0944, 'B', 'Current', 'CA', '10+ years', 'RENT', 'medical', 0.486),
    (14, 25000, 5, 0.1262, 'C', 'Current', 'NY', '1 year', 'RENT', 'credit_card', 0.494),
    (15, 4000, 3, 0.1042, 'B', 'Current', 'NY', '10+ years', 'RENT', 'credit_card', 0.337),
    (16, 12000, 3, 0.0672, 'A', 'Current', 'CT', '10+ years', 'OWN', 'vacation', 0.043),
    (17, 3000, 3, 0.0672, 'A', 'Current', 'KY', '2 years', 'RENT', 'other', 0.157),
    (18, 12000, 5, 0.2872, 'F', 'Current', 'NY', '6 years', 'RENT', 'debt_consolidation', 0.565),
    (19, 11200, 3, 0.0608, 'A', 'Current', 'NJ', '6 years', 'OWN', 'other', 0.19),
    (20, 8400, 3, 0.0735, 'A', 'Current', 'NC', '1 year', 'MORTGAGE', 'credit_card', 0.548),
    (21, 12000, 5, 0.1042, 'B', 'Current', 'FL', 'n/a', 'MORTGAGE', 'debt_consolidation', 0.241),
    (22, 8000, 3, 0.0944, 'B', 'Current', 'NY', '10+ years', 'RENT', 'major_purchase', 0.06),
    (23, 30000, 3, 0.0532, 'A', 'Current', 'TN', '10+ years', 'MORTGAGE', 'credit_card', 0.386),
    (24, 7500, 3, 0.0672, 'A', 'Current', 'MT', '10+ years', 'OWN', 'medical', 0.371),
    (25, 10000, 3, 0.1199, 'B', 'Current', 'TX', '2 years', 'MORTGAGE', 'debt_consolidation', 0.079),
    (26, 20000, 5, 0.3079, 'G', 'Current', 'VA', '10+ years', 'MORTGAGE', 'debt_consolidation', 0.79),
    (27, 10000, 3, 0.0797, 'A', 'In Grace Period', 'FL', '5 years', 'MORTGAGE', 'other', 0.336),
    (28, 20000, 5, 0.0735, 'C', 'Late (31-120 days)', 'VA', '10+ years', 'MORTGAGE', 'home_improvement', 0.218),
    (29, 10000, 3, 0.0944, 'A', 'Current', 'KS', '< 1 year', 'MORTGAGE', 'vacation', 0.177),
    (30, 19000, 3, 0.1199, 'A', 'Current', 'CA', '6 years', 'MORTGAGE', 'debt_consolidation', 0.374),
    (31, 8000, 3, 0.0993, 'B', 'Current', 'FL', 'n/a', 'OWN', 'other', 0.405),
    (32, 3200, 3, 0.2, 'B', 'Current', 'MI', '6 years', 'MORTGAGE', 'other', 0.656),
    (33, 10000, 3, 0.0608, 'B', 'Current', 'NC', '10+ years', 'MORTGAGE', 'debt_consolidation', 0.461),
    (34, 10000, 3, 0.0797, 'A', 'Fully Paid', 'NJ', '10+ years', 'MORTGAGE', 'debt_consolidation', 0.074),
    (35, 10000, 5, 0.1806, 'D', 'Current', 'MO', 'n/a', 'MORTGAGE', 'debt_consolidation', 0.671),
    (36, 20000, 3, 0.0672, 'A', 'Current', 'SC', '5 years', 'MORTGAGE', 'debt_consolidation', 0.204),
    (37, 24000, 3, 0.0993, 'A', 'Current', 'IL', '4 years', 'RENT', 'debt_consolidation', 0.067),
    (38, 20000, 5, 0.1505, 'D', 'Current', 'FL', '5 years', 'MORTGAGE', 'debt_consolidation', 0.552),
    (39, 20000, 3, 0.0532, 'A', 'Current', 'NC', '10+ years', 'MORTGAGE', 'credit_card', 0.44),
    (40, 20000, 5, 0.1806, 'B', 'Current', 'NY', '6 years', 'OWN', 'debt_consolidation', 0.317),
    (41, 13000, 5, 0.3079, 'C', 'Current', 'FL', '10+ years', 'RENT', 'debt_consolidation', 0.774),
    (42, 40000, 3, 0.1262, 'A', 'Current', 'CA', '< 1 year', 'MORTGAGE', 'debt_consolidation', 0.372),
    (43, 8400, 3, 0.1199, 'D', 'Current', 'GA', '4 years', 'MORTGAGE', 'debt_consolidation', 0.728),
    (44, 30000, 5, 0.1408, 'G', 'Current', 'NY', '10+ years', 'RENT', 'other', 0.303),
    (45, 10000, 5, 0.0608, 'C', 'Current', 'ME', '10+ years', 'RENT', 'home_improvement', 0.099),
    (46, 35000, 5, 0.0797, 'B', 'Current', 'CA', '< 1 year', 'RENT', 'debt_consolidation', 0.637),
    (47, 25000, 5, 0.0735, 'C', 'Current', 'MD', '< 1 year', 'RENT', 'credit_card', 0.888),
    (48, 10800, 5, 0.1042, 'B', 'Fully Paid', 'NC', '5 years', 'MORTGAGE', 'medical', 0.172),
    (49, 40000, 3, 0.1262, 'A', 'Current', 'TX', '6 years', 'MORTGAGE', 'other', 0.425),
    (50, 12000, 3, 0.0672, 'A', 'Current', 'MN', '< 1 year', 'MORTGAGE', 'home_improvement', 0.293),
    (51, 1200, 3, 0.0735, 'A', 'Current', 'IL', '3 years', 'RENT', 'debt_consolidation', 0.174),
    (52, 15000, 3, 0.2, 'B', 'Current', 'WA', '3 years', 'RENT', 'debt_consolidation', 0.737),
    (53, 11200, 5, 0.2388, 'F', 'Late (31-120 days)', 'AK', '4 years', 'MORTGAGE', 'medical', 0.711),
    (54, 28000, 5, 0.1709, 'C', 'Current', 'TX', '< 1 year', 'MORTGAGE', 'debt_consolidation', 0.575),
    (55, 20000, 3, 0.0993, 'A', 'Current', 'MI', '3 years', 'MORTGAGE', 'debt_consolidation', 0.415),
    (56, 40000, 3, 0.0993, 'A', 'Current', 'MO', '5 years', 'MORTGAGE', 'credit_card', 0.608),
    (57, 32000, 3, 0.1408, 'D', 'Current', 'NY', '10+ years', 'RENT', 'debt_consolidation', 0.577),
    (58, 10000, 5, 0.0944, 'E', 'Current', 'AR', '10+ years', 'RENT', 'credit_card', 0.795),
    (59, 8000, 3, 0.1091, 'D', 'Current', 'NY', '10+ years', 'RENT', 'other', 0.413),
    (60, 10000, 3, 0.0944, 'B', 'Current', 'GA', '8 years', 'OWN', 'home_improvement', 0.338),
    (61, 8000, 3, 0.0993, 'B', 'Current', 'NJ', '3 years', 'RENT', 'credit_card', 0.785),
    (62, 35000, 5, 0.1602, 'C', 'Current', 'CA', '7 years', 'RENT', 'debt_consolidation', 0.29),
    (63, 25600, 3, 0.1408, 'B', 'Current', 'VA', '10+ years', 'MORTGAGE', 'debt_consolidation', 0.454),
    (64, 22725, 3, 0.1042, 'B', 'Current', 'TN', '7 years', 'OWN', 'debt_consolidation', 0.875),
    (65, 10000, 3, 0.0993, 'B', 'Current', 'NJ', '3 years', 'RENT', 'debt_consolidation', 0.439),
    (66, 40000, 3, 0.1042, 'B', 'Current', 'WA', '3 years', 'RENT', 'small_business', 0.335),
    (67, 10000, 5, 0.2582, 'C', 'Current', 'CA', '4 years', 'MORTGAGE', 'home_improvement', 0.174),
    (68, 40000, 5, 0.0944, 'C', 'Current', 'PA', '10+ years', 'OWN', 'other', 0.559),
    (69, 15000, 3, 0.1359, 'B', 'Current', 'MN', '4 years', 'MORTGAGE', 'credit_card', 0.913),
    (70, 27000, 5, 0.1505, 'B', 'Current', 'TX', '3 years', 'RENT', 'debt_consolidation', 0.358),
    (71, 25000, 5, 0.001042, 'B', 'Current', 'TX', '10+ years', 'RENT', 'debt_consolidation', 0.524),
    (72, 16000, 5, 0.002582, 'E', 'Current', 'MS', '10+ years', 'RENT', 'other', 0.91),
    (73, 5000, 3, 0.000944, 'B', 'Current', 'AL', '8 years', 'OWN', 'small_business', 0.442),
    (74, 24000, 5, 0.001359, 'C', 'Current', 'NY', '9 years', 'RENT', 'debt_consolidation', 0.877),
    (75, 21000, 5, 0.001505, 'C', 'Current', 'FL', '2 years', 'MORTGAGE', 'debt_consolidation', 0.939)
;

To approve the loan, the bank will look at a borrower’s current employment, credit score (grade), home ownership status, and the purpose of their loan. They gather such details on the borrower to model the possibility of a default. If your credit score is low, the bank will assign a lower grade to you. For a bank, a low credit score means that the borrower is a high risk client and has a lower possibility of paying back the loan in time. This would mean that the bank would offer a high interest rate to offset the risk. Home ownership is another indicator. Risk is highest when you rent your home and lowest when you own a home. Such indicators are often gathered by the bank and stored in the database similar to the Loan Sales table above.

You may notice that unlike our previous examples, we don’t have client names in this dataset. Each client is only referenced using an ID. Most financial institutions will not have client names in each table. This is to ensure privacy and reduce risk of exposing the client’s personal information. Information like name, social security number and bank details will be saved in a separate table. Only a select few will have access to it. Each client will be referenced by ID in all other tables. For any project, it is extremely important that you take time to understand the data and how different tables interact with each other.

Let’s do a simple select statement to see the data.

Query:

select * from loan_sales;

Result:

We will go through several exercises to learn how to structure data and build advanced reports using case statements and subqueries. We will also practice preparing data analytics reports.

Case Statements

The best way to understand case statements is by example, so we’ll just jump right in to our first exercise and understand the motivation behind these statements.

Exercise 1: Loan Sales Report by Region:

You will have to categorize sales by the following regions: Northeast, Northwest, Southeast, Southwest and Far North. For example, if the state is Texas, then it should be put in the Southeast Region.

Mapping Table for Region

The best way to achieve this in SQL is using case statements. Case statements are a series of if-else statements. In our example, we have 5 categories. We will provide the list of states for each category. The second part to the case statement is the else statement. The ELSE clause is specified to provide a result when none of the conditions are met. Note that you must end each case statement by writing END.

Now, this statement will generate data that’s not in the table, so you can use AS column_name to give it a custom name.

Syntax:

CASE expression
    WHEN condition_1 THEN result_1
    WHEN condition_2 THEN result_2
   ...
    WHEN condition_N THEN result_N
    ELSE result
END

In the code below, you see that we’ve mentioned the list of states that map to each region. Each WHEN statement maps states to a region. If neither fits, then it will give out a message ‘Region is not specified’. The results will assign a region to each state per the mapping specified in the code.

Query:

select ID, state, 
CASE
  WHEN state in ('ME','IL','MI','KS','VA','NJ','CT','NY','OH','PA','MO','MN') THEN 'Northeast Region'
  WHEN state in ('AL','MS','AR','MD','GA','SC','TN','FL','LA','NC','KY','TX') THEN 'Southeast Region'
  WHEN state in ('CA','MT','WA') THEN 'Northwest Region'
  WHEN state in ('AZ','NM') THEN 'Southwest Region'
  WHEN state in ('AK') THEN 'Far North Region'
  ELSE 'Region is not specified'
END AS Region
from loan_sales;

Result:

Exercise 2: Assigning Risk Grade

Let’s assign a risk rating to each loan based in the grade in the table. Here is the mapping table:

Before you read further, think about how you would write the query taking help from our first exercise. The result should show the ID, Grade and the mapped Risk Rating. Each ‘WHEN’ statement maps grade from the loan sales table to a risk rating per the mapping table above.

Query:

select id, Grade,
CASE
  WHEN grade in ('A','B') THEN 'Low Risk'
  WHEN grade in ('C','D') THEN 'High Risk'
  WHEN grade in ('E','F') THEN 'Very High Risk'
  WHEN grade = 'G' THEN 'Junk'
  ELSE 'Missing Grade'
END AS Risk_Rating
from loan_sales;

Result:

If grade is missing or is not between A-G, then Risk Rating will show as missing grade.

Exercise 3: Loan Amount by Region and Risk Grade

Here you will use multiple case statements in a single query. Let’s put it all together by showing client information along with assigned risk rating and mapped region. You can have multiple case statements in one query. Let’s create report only for loans that are still current.

Query:

select ID, concat('$',format(Funded_Amount,'###,###,###')) AS Loan_Amount, 
CASE
  WHEN grade in ('A','B') THEN 'Low Risk'
  WHEN grade in ('C','D') THEN 'High Risk'
  WHEN grade in ('E','F') THEN 'Very High Risk'
  WHEN grade = 'G' THEN 'Junk'
  ELSE 'Missing Grade'
END AS Risk_Rating,
CASE
  WHEN State in ('ME','IL','MI','KS','VA','NJ','CT','NY','OH','PA','MO','MN') THEN 'Northeast Region'
  WHEN State in ('AL','MS','AR','MD','GA','SC','TN','FL','LA','NC','KY','TX') THEN 'Southeast Region'
  WHEN State in ('CA','MT','WA') THEN 'Northwest Region'
  WHEN State in ('AZ','NM') THEN 'Southwest Region'
  WHEN State in ('AK') THEN 'Far North Region'
  ELSE 'Region is not specified'
END AS Region,
CASE
  WHEN Term = 3 THEN 'Short Duration Loan'
  WHEN Term = 5 THEN 'Long Duration Loan'
  ELSE 'Not enough data to calculate loan term'
END AS Loan_Term,
Loan_Status
from loan_sales
where Loan_Status = 'Current';

Result:

Exercise 4: Using Case Statements to Update Table

Let’s assume that bank decides to no longer capture the grade of each loan and instead wants to use the Risk Rating method we used above. In order for the data to be consistent, you want to go back and update records. For example, if grade is A, you want to update the record to now show ‘Low Risk’. We will recycle the case statement sql code from exercise 7.

Review Querying Large Databases in SQL article for a refresher on update table. The logic is you want to specify function (update), table name (loan sales), field that will be updated (grade) and value it should be set to (Risk Rating Classification using case statement).

Query for Schema:

update loan_sales
set grade = 
(
CASE
  WHEN grade in ('A','B') THEN 'Low Risk'
  WHEN grade in ('C','D') THEN 'High Risk'
  WHEN grade in ('E','F') THEN 'Very High Risk'
  WHEN grade = 'G' THEN 'Junk'
  ELSE 'Missing Grade'
  END
);

Note: If you get this error in the schema: ‘Request Entity too large’; delete around 20 rows from the table and try running above query. This is a SQL fiddle constraint.

Query for SQL Editor:

select * from loan_sales;

Result:

You have replaced the A-G Grade in the Table to the Risk Rating classification.

Exercises to Analyze Trends

Exercise 1: Average Interest Rate by Grade

Let’s learn to create interesting reports to understand the data presented in the table. You are asked to find what the average interest rate is by grade. The interesting thing to note here is that we perform both a calculation and concatenation when specifying the field that should be pulled into the report. Interest rate from the table needs to be multiplied by 100 to show as percentage. For example, 0.3 means 30%, so we need to multiply by 100. You will add % sign at the end using a concat to make it look more professional.

Query:

select Grade, concat(format(avg(Interest_Rate*100), 2),'%') as Interest_Rate from loan_sales
group by Grade;

As you can see in the result, a low grade translates high interest rate.

Result:

Exercise 2: Maximum and Minimum Interest Rate by Grade

In this report, you will use the max and min function to dig deeper into the data. You’ve already seen the average interest rates but now let’s see the minimum and maximum value for each grade.

Query:

select Grade, concat(format(min(Interest_Rate*100),2),'%') as Min_Interest_Rate,
concat(format(max(Interest_Rate*100),2),'%') as Max_Interest_Rate
from loan_sales
group by grade;

Here you see that the numbers have a more interesting story than we saw in the previous exercise. Interest rate can be lower for grade ‘B’ than grade ‘A’. This shows you that there are other factors in deciding the final interest rate other than just the grade. Other factors could be home ownership, loan purpose, loan utilization, and length of employment. This highlights the reason we need several reports to slice and dice data in different permutations to get a better understanding of it.

Result:

Exercise 3: Interest Rate Range by Risk Rating

Here you will show the maximum and minimum interest rate as a range using the concat function. You will also show interest rate not by each grade, but grouped by risk rating. This is a great exercise to use case statement, concat function, number formatting, basic calculation and group by clause. We achieved all of this is less than 10 lines of code.

Query:

select 
CASE
  WHEN grade in ('A','B') THEN 'Low Risk'
  WHEN grade in ('C','D') THEN 'High Risk'
  WHEN grade in ('E','F') THEN 'Very High Risk'
  WHEN grade = 'G' THEN 'Junk'
  ELSE 'Missing Grade' 
END AS Risk_Rating, concat(format(min(Interest_Rate*100),2),'%',' - ',format(max(Interest_Rate*100),2),'%') as Interest_Rate_Range
from loan_sales
group by Risk_Rating

Result:

Exercise 4: Client Information

Let’s provide more client information from the loan table. Since ID is a number, you can use concat to customize the ID column. We will add 1000 to ID and add ‘Client’ before the ID number. For example, instead of showing ID as 1, the result table will show ID as Client1001. This technique is often used to customize reports while keeping ID as a number in the database. The reason we prefer to keep ID as a number and not string is to save space and make it easier to modify data in the future.

This report gives us more color on each client. Such reports are often run by both the sales and marketing teams to get a better understanding of the clientele and model a target audience.

Query:

select concat('Client',1000+ID) as Client_ID, Funded_Amount, State, Employment_Length, Home_Ownership, Loan_Purpose, Loan_Utilization
from Loan_Sales;

Result:

Subquery

A subquery is like a nested query. It is an extremely powerful tool and can be used to perform calculations in multiple steps. You can perform more data analysis and manipulations on the go without the need to create tables. At most companies, you would need approval from a database administrator to create a new table in the schema. This could take several days to weeks. Subqueries let you perform intermediate calculations without the need for extra tables.

Exercise 1: Aggregate Loan Amount by Region

Let’s aggregate the funded amount for each region. We will go over nested select statements in this section which is also referred to as a subquery.

The inner select statement will provide a result. The outer select statement will provide a result not from a table in the database but from the result of the inner select statement. You can also perform calculations in the outer select statement. The inner select statement is like a subset of the original table with or without extra fields and has to be given a name to differentiate it from the original table.

In our example, the inner select statement provides the ID, state, region (mapped using case statement) and funded amount. The outer select statement will perform a calculation on this result. It will sum all funded amount by region using a group by clause. An easy way to think of it is that we are replacing table_name in a select statement with an entire other simple statement. Here, we will give the inner select statement a name: Region_Table. Also note, that it’s always best practice to add commas to the loan numbers.

Query:

select Region, format(sum(Funded_Amount),'###,###,###') AS Funded_Amount
from 
(select ID, Funded_Amount, state, 
CASE
  WHEN state in ('ME','IL','MI','KS','VA','NJ','CT','NY','OH','PA','MO','MN') THEN 'Northeast Region'
  WHEN state in ('AL','MS','AR','MD','GA','SC','TN','FL','LA','NC','KY','TX') THEN 'Southeast Region'
  WHEN state in ('CA','MT','WA') THEN 'Northwest Region'
  WHEN state in ('AZ','NM') THEN 'Southwest Region'
  WHEN state in ('AK') THEN 'Far North Region'
  ELSE 'Region is not specified'
END AS Region
from loan_sales) Region_Table
group by region;

Result:

Exercise 2: Loan Utilization by Grade

Loan utilization is another metric used by banks. It is the amount of credit the client is currently using relative to all available revolving credit. For example. let’s assume the total credit available across all credit cards is $25,000 and your current total bill across all accounts is$2500. The loan utilization would be $\frac{2500}{25000} * 100 = 10\%$. In this query we are going to look at the relationship between the risk rating you’ve assigned and borrower’s current loan utilization.

To do that, we will use a subquery. Here we recommend you take a step-by-step approach to build the query. First, let’s write the subquery

Initial SubQuery:

select ID,
CASE
  WHEN grade in ('A','B') THEN 'Low Risk'
  WHEN grade in ('C','D') THEN 'High Risk'
  WHEN grade in ('E','F') THEN 'Very High Risk'
  WHEN grade = 'G' THEN 'Junk'
  ELSE 'Missing Grade'
END AS Risk_Rating
from loan_sales

Initial Result:

Query:

select Rating_Table.Risk_Rating, (avg(loan_utilization)*100) AS Average_Loan_Utilization
from loan_sales,
-- SubQuery
(
select ID,
CASE
  WHEN grade in ('A','B') THEN 'Low Risk'
  WHEN grade in ('C','D') THEN 'High Risk'
  WHEN grade in ('E','F') THEN 'Very High Risk'
  WHEN grade = 'G' THEN 'Junk'
  ELSE 'Missing Grade'
END AS Risk_Rating
from loan_sales
) Rating_Table
-- Always equate IDs when using subquery to ensure accurate results
where loan_sales.ID = Rating_Table.ID
group by Rating_Table.Risk_Rating

Result:

Make sure to equate the two IDs on line 16. What would happen if we didn’t do that? Let’s take a look.

Query:

select Rating_Table.Risk_Rating, (avg(loan_utilization)*100) AS Average_Loan_Utilization
from loan_sales,
-- SubQuery
(
select ID,
CASE
  WHEN grade in ('A','B') THEN 'Low Risk'
  WHEN grade in ('C','D') THEN 'High Risk'
  WHEN grade in ('E','F') THEN 'Very High Risk'
  WHEN grade = 'G' THEN 'Junk'
  ELSE 'Missing Grade'
END AS Risk_Rating
from loan_sales
) Rating_Table
-- Always equate IDs when using subquery to ensure accurate results
-- where loan_sales.ID = Rating_Table.ID
group by Rating_Table.Risk_Rating

Result:

As you can see, we would have gotten an incorrect result!

Temporary Table

All tables are created in the schema and queries are written in the SQL editor. A temporary table is a special type of table that allows you to store a temporary result set, which you can reuse several times in a single session. Below are some features of the temporary table:

  1. When writing an advanced query with multiple joins and sub queries it is handy to use a temporary table. If you notice, that there is a subquery that is being used multiple times, it is helpful to create a temporary table. You can simply reference the temporary table instead of the subquery. This will also improve query’s performance.
  2. It is deleted automatically once you close the session. Since online SQL editors don’t have sessions, you cannot experience it on SQL Fiddle.
  3. It is only available and accessible to the user who creates it. In a conventional database like Oracle SQL, there are several users and every time a user logs in a session is created. Any change made by a user is saved for everyone. For example, if you created a new table or updated data in a table. This addition/deletion is permanent for all users. Temporary tables are an exception. It’s only for the user who created it. Note, in the same session temporary tables can’t share a name.
  4. If you want to delete the temporary table in the same session it was created, you must use Drop Temporary Table Table_Name. It is best practice to delete temporary tables before exiting the session.

Exercise for Temporary table:

Let’s re-do example 2 using temporary table. Let’s put the case statement in a temporary table as shown below.

Syntax:

CREATE TEMPORARY TABLE TABLE_NAME
--followed by a select statement

Query for Schema:

CREATE TEMPORARY TABLE LOAN_SALES_TEMP
select ID, Funded_Amount, state, 
CASE
  WHEN state in ('ME','IL','MI','KS','VA','NJ','CT','NY','OH','PA','MO','MN') THEN 'Northeast Region'
  WHEN state in ('AL','MS','AR','MD','GA','SC','TN','FL','LA','NC','KY','TX') THEN 'Southeast Region'
  WHEN state in ('CA','MT','WA') THEN 'Northwest Region'
  WHEN state in ('AZ','NM') THEN 'Southwest Region'
  WHEN state in ('AK') THEN 'Far North Region'
  ELSE 'Region is not specified'
END AS Region
from loan_sales;

Query for SQL Editor:

Here, instead of writing a subquery, we simply refer to the new temporary table. Modified query below.

select Region, format(sum(Funded_Amount),'###,###,###') AS Funded_Amount
from loan_sales_temp
group by region;

Result:

It should be the same result as exercise 2.

SubQuery or Temporary Table?

While writing queries how can you decide whether you should use temporary tables or sub queries. It is not always an obvious choice. From our experience, temporary tables are most useful when creating quick ad-hoc reports or for debugging. If you are creating a query or report to use for regular reporting then it’s best to use subqueries. If you are creating a dataset for analysis or a quick ad-hoc report especially a complex report, a temporary table could be suitable.


© 2016-2022. All rights reserved.