- Filter data using advanced SQL commands (Group By, Having).
- Modify tables using Update, Drop, and Truncate
- Creating relationships between tables using Joins (Inner, outer, union)
For the following example queries, continue to use the WholeSale Database from the previous section on SqlFiddle.
Impressed by your quick turnaround, your boss wants you to send him all historical sales data per client. In this query we will learn how to give names to columns in the final report.
To do this, simply add
AS custom_column_name after the
column_name. In the query below, you can see that
client_name will be named as Client.
Next, we will perform our first calculation. All sales are under the
Total_Cost column in the sales table. The goal is to sum the
Total_Cost column by client. In order to do that you have to do two things:
- First, write
sum(column_name)to indicate which calculation needs to be performed on which column
- Next, after specifying which table to retrieve data from, write
group by column_nameto indicate which column should be aggregated to sum the cost.
In order to sort the result to show client with highest sale first, simply use
order by. As you can see, instead of specifying column name, you can simply put a number which in this case would mean to sort data by second column, i.e.
desc at the end indicates it should be sorted in the descending order.
SELECT client_name AS Client, sum(total_cost) AS Sale FROM sales GROUP BY client_name ORDER BY 2 DESC;
Let’s take the above example a step further.
We will now display total sale by client by month. Since our date of sale is stored as a regular date with day, month and year, we have to extract month from it.
Here we would like to discuss two functions:
- To get the numeric value of the month use the
- To get the name of the month, use the
For example, if date is ’03-26-2018’, the
month function will extract
3 from the date and the
monthname function will output
Next, since we are aggregating across two columns both of them should be mentioned in the
group by clause. Finally, the
order by clause is an excellent example of how you can you both column name and a number to indicate which column to sort and in what order. The reason we use month for
Order By clause is that you want to sort it by the month’s numeric value instead of name. If you used month name in the clause then it would be ordered as April, February and March instead of February, March and April.
SELECT client_name AS Client, monthname(sale_date) AS Sale_Month, sum(total_cost) AS Total_Sale FROM sales GROUP BY client_name, month(sale_date) ORDER BY month(sale_date), 3 DESC;
Let’s preview an advanced query. For this query you will pull data from both the
Sales table. The goal of this query is to view which product each client ordered and the total amount spent per product. First, in order to select data from multiple tables they should be specified after
from and separated by commas. Since tables can have columns that share a name, after select each column will be specified as
table_name.column_name. If you wanted to take all data from one table, you can write
table_name.*. This would pull all columns from that table.
As learned earlier, you can use
as to provide a custom name to any column for the final output.
Here is where ID is important. Similar to
vlookup in Excel, there should a column that’s a reference point across both tables. The
Product_ID column in the
Sales table are in fact IDs from the
ID column of the
Product table. In order to pull data from both tables, we write a where clause to indicate that
Product table should match
Product_ID of the Sales table.
SELECT sales.client_name, product.product, sum(sales.total_cost) as Total_Sale FROM Sales, Product WHERE Sales.Product_ID = Product.ID GROUP BY sales.client_name, product.product;
Now let’s say you only want to see records where the total sale was more than $200. You cannot use
and clause since the column you want to filter is an aggregate column. In this case, the
HAVING clause can be used. At the end of the above query, you simply add the condition that you only want results that satisfies the condition that total sale is greater than 200.
SELECT sales.client_name, product.product, sum(sales.total_cost) as Total_Sale FROM Sales, Product WHERE Sales.Product_ID = Product.ID GROUP BY sales.client_name, product.product HAVING Total_Sale > 200;
Your sales manager notified you that the database incorrectly shows that we sold 6 boxes to Panda Restaurant. He wants you to update the database to reflect that only 5 boxes were sold. In order to do that we will use the
Please note that code should be written in the schema (left) panel to add or change data. Since incorrect code can result in more or less updates than desired or worse case corrupts the database, best practice is to first write a select statement. Running the select statement will show you which records will be updated. In our example, we want to select the record where client is Panda Restaurant and Product ID is 2 since 2 corresponds to Box.
SELECT * FROM Sales WHERE client_name = 'Panda Restaurant' and Product_ID = 2;
The select statement is referring to the correct entry. Since the
Quantity changes, it will also impact
Total_Cost. You want to update both the
Quantity to 5 and
Total_Cost to 250 (5 boxes at $50/box).
Instead of selecting, you will start the query with “UPDATE” and before using the where clause to define conditions you will specify the column names that need to change and their corresponding values.
UPDATE Table_Name SET Column_Name_1 = Updated Value_1, …., Column_Name_n = Updated Value_n, WHERE condition to get correct record(s);
On the Left Panel, type the following below all of the other
INSERT statements, and then hit “Build Schema”.
UPDATE Sales SET Quantity = 5, Total_Cost = 250 WHERE client_name = 'Panda Restaurant' and Product_ID = 2;
Let’s run our
SELECT query again to make sure it worked. Type the following into the right Query Panel.
SELECT * FROM Sales WHERE client_name = 'Panda Restaurant' and Product_ID = 2;
As the database grows there may be obsolete tables. An annual cleanup is recommended to ensure we are not wasting database spaces with old or archaic data which in turn affects performance of the code. Let’s delete the entire
DROP TABLE table_name;
DROP TABLE Sales;
Run the above in the left panel!
If you click Browser on the left panel, you can see that we only have the Product table now. Sales table has been deleted. Be extremely careful before you drop a table — make sure everything is right, i.e. the table name, etc.
There could be a scenario where you want to delete all data but still keep the table. When dealing with large datasets, database designers will often have a historical table. An example could be a current and historical revenue table. Current revenue table will have this year’s revenue data while a historical revenue table will have all revenue data since inception. After moving data from current to historical table, you will want to delete all data from current table. In order to that, you use the “TRUNCATE” function.
TRUNCATE TABLE table_name;
For practice, we will delete data from our Sales table. Since we deleted the Sales table above, let’s go ahead and rebuild the entire schema. Copy and paste the
INSERT statements from the setup section above into the left panel. Once it’s built, type below code on the left panel, after the above statements.
TRUNCATE TABLE Sales;
If you click Browser, you can see that we still have both tables.
Write the below query on the right panel. From the result you can see that table exists but it has no records.
Let’s summarize the key difference between Drop and Truncate function. The former deletes the table and all content while the latter only deletes the content of the table.
SQL can return data across multiple tables. Joins are used to specify which data is needed from the tables. Join has two parts to it. First you specify a common value that should match between tables to return value. Next, you specify which values to return. Below are the types of joins.
- Inner Join
- Right Join
- Left Join
- Full Outer Join
With inflation rising and after reviewing client reviews, the company has decided to raise prices for some products and add some new products too.
Let’s use a new schema for this portion. Copy the following into the left panel, and then click “Build Schema”.
CREATE TABLE Product_Old (`ID` int, `Product` varchar(128), `Cost` int) ; INSERT INTO Product_Old (`ID`, `Product`, `Cost`) VALUES (1, 'Cup', 10), (2, 'Box', 50), (3, 'Plate', 15), (4, 'Spoon', 7), (5, 'Knife', 7), (6, 'Fork', 7), (7, 'Paper Napkin', 15), (8, 'Ketchup', 9), (9, 'Salt', 2), (10, 'Pepper', 2) ; CREATE TABLE Product_New (`ID` int, `Product` varchar(128), `Cost` int) ; INSERT INTO Product_New (`ID`, `Product`, `Cost`) VALUES (1, 'Cup', 12), (2, 'Box', 55), (3, 'Plate', 15), (4, 'Spoon', 8), (5, 'Knife', 8), (6, 'Fork', 8), (7, 'Paper Napkin', 15), (8, 'Ketchup', 9), (9, 'Salt', 2), (10, 'Pepper', 2), (11, 'Mustard', 3), (12, 'Mayonnaise', 3), (13, 'Silver Spoon', 20), (14, 'Silver Fork', 20), (15, 'Star Wars napkin', 17) ;
It only returns values that match in both tables. This is similar to the intersection concept in Set theory.
SELECT * FROM Product_old INNER JOIN Product_new ON Product_old.ID = Product_new.ID;
If you wanted to see the list of products with a change in cost you can use a where clause learned earlier in the lesson. Using the not equal to,
<>, operator in the where clause, you can specify that the results should only include products where the cost doesn’t match.
SELECT * FROM Product_old INNER JOIN Product_new ON Product_old.ID = Product_new.ID where Product_old.cost <> Product_new.cost;
This will give you all values that match between both tables as well as any extra values in the second table.
SELECT * FROM Product_old RIGHT JOIN Product_New ON Product_old.ID = Product_new.ID;
Since we’ve added products to the new table, you can see that it includes all the values from the new table. Additionally, it shows null values for the old table to indicate that there are no matching values from old table.
This will give you all values that match between both tables as well as any extra values in the first table.
SELECT * FROM Product_old LEFT JOIN Product_New ON Product_old.ID = Product_new.ID;
Since there are no products in the old table that aren’t present in the new table, the query shows all products from both tables.
This will give you all values from both tables. This is similar to the union concept of set theory. MySQL doesn’t recognize this function so we will use an alternative way of getting all values. This operation is called is “UNION”. It combines the results from two select statements.
SELECT * FROM Product_old RIGHT JOIN Product_New ON Product_old.ID = Product_new.ID UNION SELECT * FROM Product_old LEFT JOIN Product_New ON Product_old.ID = Product_new.ID;
Joins are an excellent way to analyze data across tables. You can view which data points changed over time, get a more holistic view of your data and understand business drivers.
SQL can answer your data related question that can directly impact your business’ bottom line.
- Who are my Top 5 clients? This information can be used to offer incentives to keep your top clients loyal.
- What is the busiest week or busiest month? This can help manage staff and your time better.
- Send me a notification when my inventory for cups drops below 100 cases. This ensures you always have all your products in stock.
- Which product is high in stock and which clients are the most likely to buy them? Marketing division can send promotions for those products to those clients.
Leveraging data you collect to help you make important sales and marketing decisions without any prior programming knowledge — this is the reason SQL is used widely across industries.
We will learn how to write more queries with a larger database to get such answers in the next article. Stay tuned!