- Filter data using advanced SQL commands (Group By, Having).
- Modify tables using Update, Drop, and Truncate
- Creating relationships between tables using Joins (Inner, outer, union)
For the following example queries, continue to use the WholeSale Database from the previous section on SqlFiddle.
Impressed by your quick turnaround, your boss wants you to send him all historical sales data per client. In this query we will learn how to give names to columns in the final report.
To do this, simply add AS custom_column_name after the column_name. In the query below, you can see that client_name will be named as Client.
Next, we will perform our first calculation. All sales are under the Total_Cost column in the sales table. The goal is to sum the Total_Cost column by client. In order to do that you have to do two things:
- First, write sum(column_name) to indicate which calculation needs to be performed on which column
- Next, after specifying which table to retrieve data from, write group by column_name to indicate which column should be aggregated to sum the cost.
In order to sort the result to show client with highest sale first, simply use order by. As you can see, instead of specifying column name, you can simply put a number which in this case would mean to sort data by second column, i.e. Sale. Using desc at the end indicates it should be sorted in the descending order.
SELECT client_name AS Client, sum(total_cost) AS SaleFROM salesGROUP BY client_nameORDER BY 2 DESC;
Let’s take the above example a step further.
We will now display total sale by client by month. Since our date of sale is stored as a regular date with day, month and year, we have to extract month from it.
Here we would like to discuss two functions: month and monthname.
- To get the numeric value of the month use the month function.
- To get the name of the month, use the monthname function.
For example, if date is ’03-26-2018’, the month function will extract 3 from the date and the monthname function will output March.
Next, since we are aggregating across two columns both of them should be mentioned in the group by clause. Finally, the order by clause is an excellent example of how you can you both column name and a number to indicate which column to sort and in what order. The reason we use month for Order By clause is that you want to sort it by the month’s numeric value instead of name. If you used month name in the clause then it would be ordered as April, February and March instead of February, March and April.
SELECT client_name AS Client, monthname(sale_date) AS Sale_Month, sum(total_cost) AS Total_SaleFROM salesGROUP BY client_name, month(sale_date)ORDER BY month(sale_date), 3 DESC;
Let’s preview an advanced query. For this query you will pull data from both the Product and Sales table. The goal of this query is to view which product each client ordered and the total amount spent per product. First, in order to select data from multiple tables they should be specified after from and separated by commas. Since tables can have columns that share a name, after select each column will be specified as table_name.column_name. If you wanted to take all data from one table, you can write table_name.*. This would pull all columns from that table.
As learned earlier, you can use as to provide a custom name to any column for the final output.
Here is where ID is important. Similar to vlookup in Excel, there should a column that's a reference point across both tables. The Product_ID column in the Sales table are in fact IDs from the ID column of the Product table. In order to pull data from both tables, we write a where clause to indicate that ID from Product table should match Product_ID of the Sales table.
SELECT sales.client_name, product.product, sum(sales.total_cost) as Total_SaleFROM Sales, ProductWHERE Sales.Product_ID = Product.IDGROUP BY sales.client_name, product.product;
Now let's say you only want to see records where the total sale was more than $200. You cannot use where or and clause since the column you want to filter is an aggregate column. In this case, the HAVING clause can be used. At the end of the above query, you simply add the condition that you only want results that satisfies the condition that total sale is greater than 200.
SELECT sales.client_name, product.product, sum(sales.total_cost) as Total_SaleFROM Sales, ProductWHERE Sales.Product_ID = Product.IDGROUP BY sales.client_name, product.productHAVING Total_Sale > 200;
Your sales manager notified you that the database incorrectly shows that we sold 6 boxes to Panda Restaurant. He wants you to update the database to reflect that only 5 boxes were sold. In order to do that we will use the UPDATE function.
Please note that code should be written in the schema (left) panel to add or change data. Since incorrect code can result in more or less updates than desired or worse case corrupts the database, best practice is to first write a select statement. Running the select statement will show you which records will be updated. In our example, we want to select the record where client is Panda Restaurant and Product ID is 2 since 2 corresponds to Box.
SELECT * FROM SalesWHERE client_name = 'Panda Restaurant' and Product_ID = 2;
The select statement is referring to the correct entry. Since the Quantity changes, it will also impact Total_Cost. You want to update both the Quantity to 5 and Total_Cost to 250 (5 boxes at $50/box).
Instead of selecting, you will start the query with “UPDATE” and before using the where clause to define conditions you will specify the column names that need to change and their corresponding values.
UPDATE Table_NameSET Column_Name_1 = Updated Value_1, …., Column_Name_n = Updated Value_n,WHERE condition to get correct record(s);
On the Left Panel, type the following below all of the other CREATE and INSERT statements, and then hit "Build Schema".
UPDATE SalesSET Quantity = 5, Total_Cost = 250WHERE client_name = 'Panda Restaurant' and Product_ID = 2;
Let's run our SELECT query again to make sure it worked. Type the following into the right Query Panel.
SELECT * FROM SalesWHERE client_name = 'Panda Restaurant' and Product_ID = 2;
As the database grows there may be obsolete tables. An annual cleanup is recommended to ensure we are not wasting database spaces with old or archaic data which in turn affects performance of the code. Let's delete the entire Sales table.
DROP TABLE table_name;
DROP TABLE Sales;
Run the above in the left panel!
If you click Browser on the left panel, you can see that we only have the Product table now. Sales table has been deleted. Be extremely careful before you drop a table — make sure everything is right, i.e. the table name, etc.
There could be a scenario where you want to delete all data but still keep the table. When dealing with large datasets, database designers will often have a historical table. An example could be a current and historical revenue table. Current revenue table will have this year’s revenue data while a historical revenue table will have all revenue data since inception. After moving data from current to historical table, you will want to delete all data from current table. In order to that, you use the “TRUNCATE” function.
TRUNCATE TABLE table_name;
For practice, we will delete data from our Sales table. Since we deleted the Sales table above, let’s go ahead and rebuild the entire schema. Copy and paste the CREATE and INSERT statements from the setup section above into the left panel. Once it’s built, type below code on the left panel, after the above statements.
TRUNCATE TABLE Sales;
If you click Browser, you can see that we still have both tables.
Write the below query on the right panel. From the result you can see that table exists but it has no records.
Let’s summarize the key difference between Drop and Truncate function. The former deletes the table and all content while the latter only deletes the content of the table.
SQL can return data across multiple tables. Joins are used to specify which data is needed from the tables. Join has two parts to it. First you specify a common value that should match between tables to return value. Next, you specify which values to return. Below are the types of joins.
- Inner Join
- Right Join
- Left Join
- Full Outer Join
With inflation rising and after reviewing client reviews, the company has decided to raise prices for some products and add some new products too.
Let's use a new schema for this portion. Copy the following into the left panel, and then click "Build Schema".
CREATE TABLE Product_Old(`ID` int, `Product` varchar(128), `Cost` int);INSERT INTO Product_Old(`ID`, `Product`, `Cost`)VALUES(1, 'Cup', 10),(2, 'Box', 50),(3, 'Plate', 15),(4, 'Spoon', 7),(5, 'Knife', 7),(6, 'Fork', 7),(7, 'Paper Napkin', 15),(8, 'Ketchup', 9),(9, 'Salt', 2),(10, 'Pepper', 2);CREATE TABLE Product_New(`ID` int, `Product` varchar(128), `Cost` int);INSERT INTO Product_New(`ID`, `Product`, `Cost`)VALUES(1, 'Cup', 12),(2, 'Box', 55),(3, 'Plate', 15),(4, 'Spoon', 8),(5, 'Knife', 8),(6, 'Fork', 8),(7, 'Paper Napkin', 15),(8, 'Ketchup', 9),(9, 'Salt', 2),(10, 'Pepper', 2),(11, 'Mustard', 3),(12, 'Mayonnaise', 3),(13, 'Silver Spoon', 20),(14, 'Silver Fork', 20),(15, 'Star Wars napkin', 17);
It only returns values that match in both tables. This is similar to the intersection concept in Set theory.
SELECT *FROM Product_oldINNER JOIN Product_new ON Product_old.ID = Product_new.ID;
If you wanted to see the list of products with a change in cost you can use a where clause learned earlier in the lesson. Using the not equal to, <>, operator in the where clause, you can specify that the results should only include products where the cost doesn’t match.
SELECT *FROM Product_oldINNER JOIN Product_new ON Product_old.ID = Product_new.IDwhere Product_old.cost <> Product_new.cost;
This will give you all values that match between both tables as well as any extra values in the second table.
SELECT *FROM Product_oldRIGHT JOIN Product_New ON Product_old.ID = Product_new.ID;
Since we’ve added products to the new table, you can see that it includes all the values from the new table. Additionally, it shows null values for the old table to indicate that there are no matching values from old table.
This will give you all values that match between both tables as well as any extra values in the first table.
SELECT *FROM Product_oldLEFT JOIN Product_New ON Product_old.ID = Product_new.ID;
Since there are no products in the old table that aren’t present in the new table, the query shows all products from both tables.
This will give you all values from both tables. This is similar to the union concept of set theory. MySQL doesn’t recognize this function so we will use an alternative way of getting all values. This operation is called is “UNION”. It combines the results from two select statements.
SELECT *FROM Product_oldRIGHT JOIN Product_New ON Product_old.ID = Product_new.IDUNIONSELECT *FROM Product_oldLEFT JOIN Product_New ON Product_old.ID = Product_new.ID;
Joins are an excellent way to analyze data across tables. You can view which data points changed over time, get a more holistic view of your data and understand business drivers.
SQL can answer your data related question that can directly impact your business’ bottom line.
- Who are my Top 5 clients? This information can be used to offer incentives to keep your top clients loyal.
- What is the busiest week or busiest month? This can help manage staff and your time better.
- Send me a notification when my inventory for cups drops below 100 cases. This ensures you always have all your products in stock.
- Which product is high in stock and which clients are the most likely to buy them? Marketing division can send promotions for those products to those clients.
Leveraging data you collect to help you make important sales and marketing decisions without any prior programming knowledge — this is the reason SQL is used widely across industries.
We will learn how to write more queries with a larger database to get such answers in the next article. Stay tuned!