Aggregating Data in SQL Part 2
July 04, 2018
In this tutorial, we will learn how to use different mathematical functions in order to interpret data better, and how to create reports.
Objectives
- Clean data using SQL mathematical functions (
ABS
,SIGN
,MOD
, etc.). - Learn how to write comments in SQL
- Creating reports by applying aggregation methods.
Mathematical operations
We’re going to let you in on a secret. SQL can perform most mathematical functions for you independent of the schema. Often when writing queries you may need to do some quick calculations. You can save time by performing the calculations in SQL itself.
Multiplication
If you wanted to know what 23 times 49 is, you can simply ask SQL using a select statement. Here’s an example:
Query:
SELECT 23 * 49
Result:
We will discuss some useful mathematical operations that you can perform in SQL and that will be very useful in either cleaning up or manipulating large datasets.
Modulus (MOD)
Modulus calculates the remainder after division of one number by another, so $234\space \% \space 7 == 18$.
Query:
select 234 mod 72 as Result;
Result:
Absolute Value (ABS)
Query:
select abs(-1569) as Result;
Result:
SIGN
Query:
select sign(-47) as Result;
Result:
The result is -1
if the value is negative, 1
if the value is positive, and 0
if the value is zero.
$e^x$ (EXP)
Query:
SELECT EXP(7) as Result;
Result:
Rounding (ROUND)
Query:
SELECT ROUND(54.312) as Result;
Result:
You can also pass an optional second argument that specifies the precision of the rounding. In the example below, we will round to the 2nd decimal place.
Query:
SELECT ROUND(54.31265, 2) as Result;
Result:
Exponentiation (POWER)
Let’s say we want to calculate $7^2$.
Query:
SELECT power(7, 2) as Result;
Result:
Random Number (RAND)
Query:
SELECT rand() as Result;
Result:
Commenting
When writing long and complicated queries, it is highly recommended to add comments to explain your thought process. Comments can be added to the query in 2 ways:
Commenting One Line
Syntax: -- This is a single line comment
Query:
-- This is a single line comment
select * from sales_inquiry;
Result:
Commenting Multiple Lines
Syntax:
/* This comment
is multiple lines
long */
Query:
/* This comment
is multiple lines
long */
select * from sales_inquiry;
Result:
You will get same result for both queries. Comments are not executed by SQL and the primary purpose is to explain the query’s logic. It is also used to trouble shoot issues or hide code.
Creating Reports
Let’s write queries to create some useful reports to yield analytics from our data. Re-build the schema using the below code. Remember to copy paste it into the left column on SQLFiddle and click on ‘Build Schema’.
CREATE TABLE Sales_Inquiry
(`ID` int, `Client_Name` varchar(20), `City` varchar(13), `State` varchar(11), `Product` varchar(24), `Quantity` int, `Status` varchar(15), `Sales_Person` varchar(16), `Total_Sale` int, `Sale_Date` datetime)
;
INSERT INTO Sales_Inquiry
(`ID`, `Client_Name`, `City`, `State`, `Product`, `Quantity`, `Status`, `Sales_Person`, `Total_Sale`, `Sale_Date`)
VALUES
(1, 'The National', 'Elmhurst', 'New York', 'Eco Friendly Plate', 3, NULL, 'Joe Davis', 66, '2018-04-24 00:00:00'),
(2, 'The National', 'Elmhurst', 'New York', 'Eco Friendly Cutlery Set', 3, NULL, 'Joe Davis', 96, '2018-04-24 00:00:00'),
(3, 'The National', 'Elmhurst', 'New York', 'Oven Mitt', 1, NULL, 'Joe Davis', 25, '2018-04-24 00:00:00'),
(4, 'Butler', 'Stamford', 'Connecticut', 'Ketchup', 5, 'Order Shipped', 'Mike Hannigan', 35, '2018-03-15 00:00:00'),
(5, 'Butler', 'Stamford', 'Connecticut', 'Salt and Pepper', 5, 'Order Shipped', 'Mike Hannigan', 50, '2018-03-15 00:00:00'),
(6, 'Butler', 'Stamford', 'Connecticut', 'Mustard', 5, 'Order Shipped', 'Mike Hannigan', 45, '2018-03-15 00:00:00'),
(7, 'Butler', 'Stamford', 'Connecticut', 'Mayonnaise', 5, 'Order Shipped', 'Mike Hannigan', 55, '2018-03-15 00:00:00'),
(8, 'Butler', 'Stamford', 'Connecticut', 'Cutlery Set', 5, 'Order Shipped', 'Mike Hannigan', 65, '2018-03-15 00:00:00'),
(9, 'Carmines', 'San Diego', 'California', 'Oven Mitt', 2, 'Payment Pending', 'Robert Read', 50, '2018-02-24 00:00:00'),
(10, 'Tao', 'Hartford', 'Connecticut', 'Plastic Cup', 7, 'Order Completed', 'Robert Read', 252, '2018-01-03 00:00:00'),
(11, 'Tao', 'Hartford', 'Connecticut', 'Plastic Box', 7, 'Order Completed', 'Robert Read', 525, '2018-01-03 00:00:00'),
(12, 'Tao', 'Hartford', 'Connecticut', 'Plate', 7, 'Order Completed', 'Robert Read', 315, '2018-01-03 00:00:00'),
(13, 'Upland', 'Dallas', 'Texas', 'Mustard', 4, NULL, 'Cynthia Gitelson', 36, '2018-04-07 00:00:00'),
(14, 'Upland', 'Dallas', 'Texas', 'Mayonnaise', 4, NULL, 'Cynthia Gitelson', 44, '2018-04-07 00:00:00'),
(15, 'Alcala', 'New York', 'New York', 'Cheese Plate', 1, 'Order Completed', 'Joe Davis', 72, '2018-02-22 00:00:00'),
(16, 'Alcala', 'New York', 'New York', 'Eco Friendly Napkin', 5, 'Order Completed', 'Joe Davis', 60, '2018-02-22 00:00:00'),
(17, 'Haru Sushi', 'Los Angeles', 'California', 'Eco Friendly Box', 2, 'Order Completed', 'Robert Read', 74, '2018-01-18 00:00:00'),
(18, 'Haru Sushi', 'Los Angeles', 'California', 'Eco Friendly Plate', 3, 'Order Completed', 'Robert Read', 66, '2018-01-18 00:00:00'),
(19, 'Haru Sushi', 'Los Angeles', 'California', 'Eco Friendly Cutlery Set', 1, 'Order Completed', 'Robert Read', 32, '2018-01-18 00:00:00'),
(20, 'Gramercy Tavern', 'Milford', 'Connecticut', 'Cheese Plate', 1, 'Payment Pending', 'Mike Hannigan', 72, '2018-03-03 00:00:00'),
(21, 'Gramercy Tavern', 'Milford', 'Connecticut', 'Kitchen Knife', 1, 'Payment Pending', 'Mike Hannigan', 150, '2018-03-03 00:00:00'),
(22, 'Gramercy Tavern', 'Milford', 'Connecticut', 'Oven Mitt', 3, 'Payment Pending', 'Mike Hannigan', 75, '2018-03-03 00:00:00'),
(23, 'Momofuku Ice Cream', 'San Francisco', 'California', 'Take Out Bags', 1, 'Order Canceled', 'Robert Read', 7, '2018-05-02 00:00:00'),
(24, 'Holey Cream', 'Houston', 'Texas', 'Spoon', 7, NULL, 'Cynthia Gitelson', 119, '2018-04-17 00:00:00'),
(25, 'Holey Cream', 'Houston', 'Texas', 'Paper Napkin', 1, NULL, 'Cynthia Gitelson', 9, '2018-04-17 00:00:00'),
(26, 'Rice to Riches', 'Jamaica', 'New York', 'Oven Mitt', 5, 'Payment Pending', 'Joe Davis', 125, '2018-03-24 00:00:00'),
(27, 'Insomnia Cookies', 'New Haven', 'Connecticut', 'Take Out Bags', 1, 'Order Rejected', 'Mike Hannigan', 7, '2018-02-05 00:00:00'),
(28, 'Korean Steakhouse', 'Elmhurst', 'New York', 'Plastic Cup', 5, 'Order Completed', 'Joe Davis', 180, '2018-03-06 00:00:00'),
(29, 'Korean Steakhouse', 'Elmhurst', 'New York', 'Plastic Box', 6, 'Order Completed', 'Joe Davis', 450, '2018-03-25 00:00:00'),
(30, 'Korean Steakhouse', 'Elmhurst', 'New York', 'Salt and Pepper', 10, 'Order Completed', 'Joe Davis', 100, '2018-04-09 00:00:00'),
(31, 'Korean Steakhouse', 'Elmhurst', 'New York', 'Mustard', 10, 'Order Completed', 'Joe Davis', 90, '2018-04-24 00:00:00'),
(32, 'Panda Restaurant', 'Stamford', 'Connecticut', 'Plastic Box', 6, 'Order Completed', 'Mike Hannigan', 450, '2018-02-22 00:00:00'),
(33, 'Panda Restaurant', 'Stamford', 'Connecticut', 'Knife', 7, 'Order Completed', 'Mike Hannigan', 119, '2018-03-05 00:00:00'),
(34, 'Panda Restaurant', 'Stamford', 'Connecticut', 'Plate', 8, 'Order Completed', 'Mike Hannigan', 360, '2018-04-16 00:00:00'),
(35, 'Brooklyn Wholesalers', 'Stamford', 'Connecticut', 'Paper Napkin', 3, 'Order Completed', 'Mike Hannigan', 27, '2018-02-14 00:00:00'),
(36, 'Brooklyn Wholesalers', 'San Diego', 'California', 'Ketchup', 12, 'Order Completed', 'Robert Read', 84, '2018-03-08 00:00:00'),
(37, 'Brooklyn Wholesalers', 'San Diego', 'California', 'Plastic Cup', 15, 'Order Completed', 'Robert Read', 540, '2018-03-21 00:00:00'),
(38, 'Brooklyn Wholesalers', 'San Diego', 'California', 'Fork', 6, 'Order Completed', 'Robert Read', 102, '2018-04-22 00:00:00'),
(39, 'Chillhouse Coffe', 'Dallas', 'Texas', 'Plastic Cup', 15, 'Order Completed', 'Cynthia Gitelson', 540, '2018-02-05 00:00:00'),
(40, 'Chillhouse Coffe', 'Dallas', 'Texas', 'Knife', 11, 'Order Completed', 'Cynthia Gitelson', 187, '2018-02-16 00:00:00'),
(41, 'Chillhouse Coffe', 'Dallas', 'Texas', 'Plastic Box', 3, 'Order Completed', 'Cynthia Gitelson', 225, '2018-03-05 00:00:00'),
(42, 'Chillhouse Coffe', 'Dallas', 'Texas', 'Spoon', 14, 'Order Completed', 'Cynthia Gitelson', 238, '2018-03-16 00:00:00'),
(43, 'Chillhouse Coffe', 'Dallas', 'Texas', 'Plate', 17, 'Order Completed', 'Cynthia Gitelson', 765, '2018-04-05 00:00:00'),
(44, 'Homemade Ice Cream', 'Los Angeles', 'California', 'Salt and Pepper', 9, 'Order Completed', 'Robert Read', 90, '2018-02-13 00:00:00'),
(45, 'Homemade Ice Cream', 'Los Angeles', 'California', 'Plate', 11, 'Order Completed', 'Robert Read', 495, '2018-03-08 00:00:00'),
(46, 'Homemade Ice Cream', 'Los Angeles', 'California', 'Spoon', 3, 'Order Completed', 'Robert Read', 51, '2018-03-16 00:00:00'),
(47, 'Homemade Ice Cream', 'Los Angeles', 'California', 'Plastic Box', 16, 'Order Completed', 'Robert Read', 1200, '2018-04-02 00:00:00'),
(48, 'Mama Bakery', 'Milford', 'Connecticut', 'Mustard', 5, 'Order Completed', 'Mike Hannigan', 45, '2018-02-17 00:00:00'),
(49, 'Mama Bakery', 'Milford', 'Connecticut', 'Fork', 8, 'Order Completed', 'Mike Hannigan', 136, '2018-03-09 00:00:00'),
(50, 'Mama Bakery', 'Milford', 'Connecticut', 'Paper Napkin', 6, 'Order Completed', 'Mike Hannigan', 54, '2018-03-26 00:00:00'),
(51, 'Mama Bakery', 'Milford', 'Connecticut', 'Fork', 5, 'Order Completed', 'Mike Hannigan', 85, '2018-04-15 00:00:00')
;
CREATE TABLE Product
(`ID` int, `Product` varchar(24), `Pieces` int, `Cost` int)
;
INSERT INTO Product
(`ID`, `Product`, `Pieces`, `Cost`)
VALUES
(1, 'Plastic Cup', 600, 36),
(2, 'Plastic Box', 500, 75),
(3, 'Plate', 120, 45),
(4, 'Spoon', 1000, 17),
(5, 'Knife', 1000, 17),
(6, 'Fork', 1000, 17),
(7, 'Paper Napkin', 1000, 9),
(8, 'Ketchup', 500, 7),
(9, 'Salt and Pepper', 500, 10),
(10, 'Mustard', 500, 9),
(11, 'Mayonnaise', 500, 11),
(12, 'Cutlery Set', 250, 13),
(13, 'Take Out Bags', 1000, 7),
(14, 'Cheese Plate', 12, 72),
(15, 'Eco Friendly Box', 100, 37),
(16, 'Eco Friendly Plate', 100, 22),
(17, 'Eco Friendly Cutlery Set', 1000, 32),
(18, 'Eco Friendly Napkin', 1000, 12),
(19, 'Kitchen Knife', 15, 150),
(20, 'Oven Mitt', 10, 25)
;
The Sales_Inquiry
table is very similar to the one created in the part 1 of this tutorial but has more data and some of the columns have been renamed.
We also created a Product
table has the following columns:
ID
Product
Pieces
Cost
Year to Date Client Billings by Sales Person
Let’s say that for each sales person, you want to calculate the total sale for this year. You use the sum
function on the total sale column and group by the sales person’s name to get the below result. We also added an order by so that it’s easier to see who the top salesperson this year is.
Query:
select sales_person as Name, concat('$',format(sum(total_sale),'###,###,###')) as Total_Billing
from sales_inquiry
group by sales_person
order by Total_Billing DESC;
Result:
The report tells you that Robert Read has the highest billing so far this year.
Monthly Client Billings by Sales Person
We can further break down above report by month. You simply add the month to both the field selection and group by clause to create an additional filter. You can see that for instance, Cynthia Gitelson has the highest billing for the month of February.
Query:
select sales_person as Name, monthname(sale_date) as Month, concat('$',format(sum(total_sale),'###,###,###')) as Total_Billing
from sales_inquiry
group by sales_person, month(sale_date)
order by sales_person, month(sale_date);
Result:
Monthly Billings by State
To identify how the business is growing geographically, lets group by state
and compare the Total_Billing
. Most importantly, this will help us understand which markets need more focus.
Query:
select state as State, monthname(sale_date) as Month, concat('$',format(sum(total_sale),'###,###,###')) as Total_Billing
from sales_inquiry
group by state, month(sale_date);
Result:
Top 5 Products by Cost
You select the product
and cost
column from the Product
table to begin the query. If you want the top 5 products, you use a limit
statement indicating that you want to limit the result to 5 rows. Since you want the top 5, the cost should be sorted in the descending order.
Query:
select product, cost from product
order by 2 desc
limit 5;
Result:
Top 5 Products by Cost per Piece
If 500 pieces of salt and pepper cost $10, then the cost per piece is 10/500 = 0.02. You can add columns to the result and perform calculations by simply stating it in the select statement. Since cost per piece is not in the product table, we can simply write this in the select statement: cost/pieces as Cost_per_piece
. It will be calculated on the fly in the select statement itself.
Query:
select product, cost/pieces as Cost_per_piece from product
order by 2 desc
limit 5;
Result:
As you can see in the above queries, the list of top 5 products varies when comparing total cost and cost by piece. This is an important distinction that would help a marketing team decide which products to focus on. At first glance, it might seem that plastic box could be a top revenue generating product but when calculating the cost by piece you realize that it may not be a top choice. If you add weight and profit margin to the above data, it would really help identify the most profitable products.
Advanced Query
Let’s say we want to create a report of completed orders for the months of February through April. Additionally, you want to show the cost per piece of each of the products.
For this, we need to bring data across both the product
and sales_inquiry
table together.
The first step when tying together data across tables is identifying the common field. In this case, the product name labeled as product
is the common field across both tables. Therefore, the where
clause will equate the product
across both tables. This will tell SQL how to match data when pulling across both tables.
Note, in order to simplify the query, both tables are given an alias. Alias for Product table is ‘p
’ and Sales Inquiry table is ‘s
’. When indicating which table the field should be pulled from in the select statement, it is easier to use the alias a prefix. Instead of having to write sales_inquiry.client_name
you can simply write s.client_name
.
Query:
select s.client_name, s.product, s.quantity, p.cost, p.pieces, p.cost/p.pieces as Cost_per_piece, s.total_sale, s.sale_date
from product p, sales_inquiry s
where p.product = s.product
and s.status = 'Order Completed'
and sale_date between '2018-02-01'and '2018-04-30'
order by 1;
Result:
Conclusion
You should now have a good understanding of how to write queries. In the following articles, we will discuss how to write more advanced queries to run analytics on large datasets.