CommonLounge Archive

Aggregating Data in SQL Part 1

July 04, 2018

In the previous article we learned how to write simple SQL queries. Now, we will expand our skillset to learn how to aggregate data using aggregation commands.

Objectives

  • Write Boolean statements with the IN SQL conditional operator and null functions (IS NULL, IFNULL, IS NOT NULL, and <>).
  • Summarize data sets by employing aggregation functions (Sum, Average, Count, etc.).

Let’s say that you have recently started working for a restaurant supply wholesaler. You are tasked with leveraging the sales inquiry table to manage client communication more efficiently. Additionally, you have to co-ordinate with the sales people to ensure all queries are answered. Let’s look at some examples.


SQL Fiddle Recap

If you already remember how to use SQL Fiddle, skip to the setup section below.


SQL Fiddle is an online tool that let’s you run SQL Queries. It is a great tool to practice what we will learn in this article. Similar to database we will have tables in SQL and then write queries to extract data from those tables. The left panel is the Schema Panel where we will build table and add data to those tables. The right panel is where we will write SQL code to extract data from those tables and run calculations on those values. The bottom panel is where you will see the results of your query. Since we can’t run a query without tables, the right and lower panel will be grayed out until you build a schema aka create tables. Schema can also have other objects in addition to tables which we will cover later.


Setup

Copy and paste the following schema into the Schema panel and click on “Build Schema”.

CREATE TABLE Sales_Inquiry
    (`ID` int, `Client_Name` varchar(18), `City` varchar(13), `State` varchar(11), `Product` varchar(24), `Quantity` int, `Status` varchar(15), `Sales_Person` varchar(16), `Product_Total` int, `Date` datetime)
;
INSERT INTO Sales_Inquiry
    (`ID`, `Client_Name`, `City`, `State`, `Product`, `Quantity`, `Status`, `Sales_Person`, `Product_Total`, `Date`)
VALUES
    (1, 'The National', 'Elmhurst', 'New York', 'Eco Friendly Plate', 3, NULL, 'Joe Davis', 66, '2018-04-24 00:00:00'),
    (2, 'The National', 'Elmhurst', 'New York', 'Eco Friendly Cutlery Set', 3, NULL, 'Joe Davis', 96, '2018-04-24 00:00:00'),
    (3, 'The National', 'Elmhurst', 'New York', 'Oven Mitt', 1, NULL, 'Joe Davis', 25, '2018-04-24 00:00:00'),
    (4, 'Butler', 'Stamford', 'Connecticut', 'Ketchup', 5, 'Order Shipped', 'Mike Hannigan', 35, '2018-03-15 00:00:00'),
    (5, 'Butler', 'Stamford', 'Connecticut', 'Salt and Pepper', 5, 'Order Shipped', 'Mike Hannigan', 50, '2018-03-15 00:00:00'),
    (6, 'Butler', 'Stamford', 'Connecticut', 'Mustard', 5, 'Order Shipped', 'Mike Hannigan', 45, '2018-03-15 00:00:00'),
    (7, 'Butler', 'Stamford', 'Connecticut', 'Mayonnaise', 5, 'Order Shipped', 'Mike Hannigan', 55, '2018-03-15 00:00:00'),
    (8, 'Butler', 'Stamford', 'Connecticut', 'Cutlery Set', 5, 'Order Shipped', 'Mike Hannigan', 65, '2018-03-15 00:00:00'),
    (9, 'Carmines', 'San Diego', 'California', 'Oven Mitt', 2, 'Payment Pending', 'Robert Read', 50, '2018-02-24 00:00:00'),
    (10, 'Tao', 'Hartford', 'Connecticut', 'Plastic Cup', 7, 'Order Completed', 'Robert Read', 252, '2018-01-03 00:00:00'),
    (11, 'Tao', 'Hartford', 'Connecticut', 'Plastic Box', 7, 'Order Completed', 'Robert Read', 525, '2018-01-03 00:00:00'),
    (12, 'Tao', 'Hartford', 'Connecticut', 'Plate', 7, 'Order Completed', 'Robert Read', 315, '2018-01-03 00:00:00'),
    (13, 'Upland', 'Dallas', 'Texas', 'Mustard', 4, NULL, 'Cynthia Gitelson', 36, '2018-04-07 00:00:00'),
    (14, 'Upland', 'Dallas', 'Texas', 'Mayonnaise', 4, NULL, 'Cynthia Gitelson', 44, '2018-04-07 00:00:00'),
    (15, 'Alcala', 'New York', 'New York', 'Cheese Plate', 1, 'Order Completed', 'Joe Davis', 72, '2018-02-22 00:00:00'),
    (16, 'Alcala', 'New York', 'New York', 'Eco Friendly Napkin', 5, 'Order Completed', 'Joe Davis', 60, '2018-02-22 00:00:00'),
    (17, 'Haru Sushi', 'Los Angeles', 'California', 'Eco Friendly Box', 2, 'Order Completed', 'Robert Read', 74, '2018-01-18 00:00:00'),
    (18, 'Haru Sushi', 'Los Angeles', 'California', 'Eco Friendly Plate', 3, 'Order Completed', 'Robert Read', 66, '2018-01-18 00:00:00'),
    (19, 'Haru Sushi', 'Los Angeles', 'California', 'Eco Friendly Cutlery Set', 1, 'Order Completed', 'Robert Read', 32, '2018-01-18 00:00:00'),
    (20, 'Gramercy Tavern', 'Milford', 'Connecticut', 'Cheese Plate', 1, 'Payment Pending', 'Mike Hannigan', 72, '2018-03-03 00:00:00'),
    (21, 'Gramercy Tavern', 'Milford', 'Connecticut', 'Kitchen Knife', 1, 'Payment Pending', 'Mike Hannigan', 150, '2018-03-03 00:00:00'),
    (22, 'Gramercy Tavern', 'Milford', 'Connecticut', 'Oven Mitt', 3, 'Payment Pending', 'Mike Hannigan', 75, '2018-03-03 00:00:00'),
    (23, 'Momofuku Ice Cream', 'San Francisco', 'California', 'Take Out Bags', 1, 'Order Canceled', 'Robert Read', 7, '2018-05-02 00:00:00'),
    (24, 'Holey Cream', 'Houston', 'Texas', 'Spoon', 7, NULL, 'Cynthia Gitelson', 119, '2018-04-17 00:00:00'),
    (25, 'Holey Cream', 'Houston', 'Texas', 'Paper Napkin', 1, NULL, 'Cynthia Gitelson', 9, '2018-04-17 00:00:00'),
    (26, 'Rice to Riches', 'Jamaica', 'New York', 'Oven Mitt', 5, 'Payment Pending', 'Joe Davis', 125, '2018-03-24 00:00:00'),
    (27, 'Insomnia Cookies', 'New Haven', 'Connecticut', 'Take Out Bags', 1, 'Order Rejected', 'Mike Hannigan', 7, '2018-02-05 00:00:00')
;

We created a Sales_Inquiry table that has the following columns:

  • ID
  • Client_Name
  • City
  • State
  • Product
  • Quantity
  • Status
  • Sales_Person
  • Product_Total
  • Date

Boolean Operators

On the right panel, let’s write the below queries to understand the data.

An interesting thing to note about the Sales_Inquiry table is that each client inquiry is broken down into multiple rows. Each row represents a product requested by client. In the screenshot below, you can see the The National has inquired about Eco Friendly Plate, Eco Friendly Cutlery Set and Oven Mitt. The table stores client’s city and state, quantity requested for each product, assigned sales person, status, total cost of each product requested and inquiry date.

Query:

 select * from sales_inquiry;

Result:


IS NULL

First priority is to find all inquiries that aren’t answered and forward them to the responsible sales person. You have to simply find all inquiries where status is empty (null). In SQL, you have to simply write is null in the where clause to filter for blank (null) values.

Query:

select * from sales_inquiry
where status is null;

Result:

Here, we can see there are 4 clients with outstanding queries. You can email Joe Davis and Cynthia Gitelson with their clients and ask them to follow up.


IFNULL

If you had to submit the above report to senior management, the status column with null values does not look professional. Instead of going back and updating the database, SQL let’s you replace null values with a specified number or character string. Let’s replace null with Client Follow-Up Required. Additionally, let’s sort by date to set a priority for the sales person. Therefore, in this example Cynthia should reach out to Upland before Holey cream.

Note that we’ll use the ifnull function to do this. If the first argument is null, SQL uses the second value in the output.

Query:

select Client_Name, Product, Quantity, ifnull(status,'Client Follow-Up Required') as Status, Sales_Person, Product_Total, Date
from sales_inquiry
where status is null
order by date;

Result:


IS NOT NULL and <> (Not Equals)

Alternatively, if you wanted a list of all active inquiries, we would use is not null in the where clause. Let’s say we want all inquiries that are in progress. We don’t want inquiries that have not been answered (‘is not null) or have been completed (i.e. status is Order Completed).

Query:

select * from sales_inquiry
where status is not null
and Status <> 'Order Completed';

Result:


IN

You get an urgent request from sales director that she needs all previous data for Butler and Carmines. She has scheduled meetings with them this afternoon and would like historical sales inquiries on hand to give her a better idea of which new products to pitch. When you want to specify multiple filters in the ‘where’ or ‘and’ clause you can use the ‘IN’ function. Separate each filter with commas as seen in the query below.

Query:

select * from sales_inquiry
where client_name in ('Butler','Carmines')
order by date;

Result:


NOT IN

Similarly, you may want to exclude clients from the report. To exclude one client use the != or <> mathematical operator. To exclude more than one client, use the not in function. Let’s see examples below.

Query:

select * from sales_inquiry
where client_name != 'Butler';

Result: Butler should not be a client in the below results.

Query:

select * from sales_inquiry
where client_name not in ('Butler','Carmines')
order by date;

Result: Result should not have Butler or Carmines in the client list.


You can use these functions for numeric, text or date columns. We recommend practicing these functions on all types of columns.

Aggregate Functions

Let’s take our queries a step further and learn how to summarize data using aggregate functions. We will primarily cover sum, average, count, min, and max. We will see several examples in this section.

SUM, FORMAT, and CONCAT

Let’s create a report with total sale for each client. You use the sum function to to add the product_total numbers for each client.

Let’s also format the sales values to look like financial numbers. Instead of simply showing 1092 we want to show $1,092. This is a two step process.

  • First we add commas to the total sale value by using the format function.
  • Next, we use the concat function (similar to Excel’s concatenate function) to add the character string ‘$’ to the number. Note you can add multiple strings in the concat function (just separate them with a comma). Since you’re simply adding a character, you can display the sale numbers in any currency.

Query:

select client_name, concat('$',format(sum(product_total),'###,###,###')) as Total_Sale from sales_inquiry
group by client_name;

Result:

The sum function should be paired with a group function to create these reports. If you don’t use a group function, it will simply sum the entire column. In our example, we want to sum product total for each client. If you see the example below, it simply sums the entire product total column and displays it with the first client. When writing queries, you want to avoid such situations where the result provides little value.

Query:

select client_name, concat('$',format(sum(product_total),'###,###,###')) as Total_Sale from sales_inquiry

Result:

COUNT

For each client inquiry, let’s summarize both the number of products ordered as well as total sale for each order. To calculate the number of products ordered you will use the count function. For Total_Sale, you will use the sum function. We will use the group function to ensure that both totals are aggregated by client.

Query:

select client_name, count(product) as Total_Product, sum(product_total) as Total_Sale from sales_inquiry
group by client_name;

Result:

AVG

If you wanted to calculate average sale every month, you can use the avg function. You specify which field you want to perform the average calculation on. Next, you provide a group by clause to ensure it aggregates correctly and doesn’t simply give you the average of the entire column.

Query:

select monthname(date) AS Sale_Month, avg(product_total) Average_Sale from sales_inquiry
GROUP BY month(date);

Result:

MAX

Maximum Value: The max function will provide the maximum value from the entire list.

Query:

 SELECT max(quantity) as Maximum from sales_inquiry;

Result:

MIN

Minimum Value: The min function will provide the minimum value from the entire list.

Query:

SELECT min(quantity) as Minimum from sales_inquiry;

Result:


© 2016-2022. All rights reserved.