SQL or Structured Query Language is a robust programming tool to run queries on large datasets. A query is just another word for code used to interact with the database. SQL is a highly sought after skillset across industries and work profiles because it can be implemented relatively easily and you don’t need a programming background to learn it. Most finance and marketing professionals with little or no programming experience are required to have an in-depth knowledge of it.
For large datasets, SQL is preferred since it processes data faster than excel or VBA. VBA is confined by Excel’s processing ability while SQL is confined by the database’s processing ability.
Databases, like a Microsoft Access database, are created to store large quantities of data — they process data faster and more efficiently. It is also a dynamic querying tool which means it is easier to update queries as your requirements change.
SQL is easy to learn, used in most industries and has a strong online community for troubleshooting. Lastly, with companies’ increasing focus on data analytics, it is the best tool to slice and dice data and generate automated reports as needed.
A database is a structured set of data that is organized so that it can be easily searched and manipulated. Each database has tables and each table can be thought of as an excel spreadsheet with finite rows and columns. You can write queries to extract and calculate information from one or more tables.
SQL can be used to create custom reports, perform complex calculations and manipulate multiple data points. In the following sections, we will give you sample code through multiple examples to get you started right away.
An important concept for Databases is Reference Identification which will henceforth be referred to as simply ID. We will have an ID column in all tables to distinguish each row. An ID column makes it easier to write SQL code, saves database space and is independent of changes made to the reference data.
For example, if we say Cups have an ID of 1. All our queries are written with the ID 1 instead of Cups. In the future, if we want to update the table to amend Cups to Paper Cups, we can do that without making any changes to the code, since our code refers it as 1 not Cups or Paper Cups. Additionally, when we reference it in other tables, 1 occupies less space than Cups. While it may seem insignificant now, it matters when you have several hundred thousand or even million rows in each table. As we go through the article, we will see more examples of the significance of IDs.
We wanted to introduce you to a few SQL basics that we'll be using in our queries below.
Number: A number can be referenced as is without additional formatting change.
Text and Date Format: Text and Date should always be in single quotation marks.
Mathematical operators: Below are the most commonly used mathematical operators to write queries.
SQL Fiddle is an online tool that lets you run SQL Queries. It is a great tool to practice what we will learn in this article.
Similar to databases, we will have tables in SQL and then write queries to extract data from those tables. The left panel is the Schema Panel where we will build table and add data to those tables. The right panel is the Query Panel where we will write SQL code to extract data from those tables and run calculations on those values. The bottom panel is where you will see the results of your query. Since we can’t run a query without tables, the right and lower panel will be grayed out until you build a schema aka create tables. Schema can also have other objects in addition to tables which we will cover later.
We will use a sample Wholesale database to write all our queries in this article. Consider a wholesaler who maintains a database of all products and daily sales.
- The Product table has a list of products, cost for each product and an ID associated with each product.
- The Sales table has Client Name, quantity of each product, Product ID (referencing Product from the Product table), Total Cost and Sale Date.
The database in this example has only a few rows so that it's easier for you to understand queries. You can easily use the same concepts for a database with hundreds of products and several years of sale data.
Go to SQL Fiddle above and copy the below schema into the left panel. This will create our sample database and allow us to run queries against it. When you're done pasting, click on the "Build Schema" button.
CREATE TABLE Sales(`ID` int, `Client_Name` varchar(20), `Quantity` int, `Product_ID` int, `Cost` int, `Total_Cost` int, `Sale_Date` datetime);INSERT INTO Sales(`ID`, `Client_Name`, `Quantity`, `Product_ID`, `Cost`, `Total_Cost`, `Sale_Date`)VALUES(1, 'Dave Consulting', 5, 1, 10, 50, '2018-03-06 00:00:00'),(2, 'Dave Consulting', 6, 2, 50, 300, '2018-03-25 00:00:00'),(3, 'Dave Consulting', 10, 9, 2, 20, '2018-04-09 00:00:00'),(4, 'Dave Consulting', 10, 10, 2, 20, '2018-04-24 00:00:00'),(5, 'Panda Restaurant', 6, 2, 50, 300, '2018-02-22 00:00:00'),(6, 'Panda Restaurant', 7, 5, 7, 49, '2018-03-05 00:00:00'),(7, 'Panda Restaurant', 8, 3, 15, 120, '2018-04-16 00:00:00'),(8, 'Brooklyn Wholesalers', 3, 7, 15, 45, '2018-02-14 00:00:00'),(9, 'Brooklyn Wholesalers', 12, 8, 9, 108, '2018-03-08 00:00:00'),(10, 'Brooklyn Wholesalers', 15, 1, 10, 150, '2018-03-21 00:00:00'),(11, 'Brooklyn Wholesalers', 6, 6, 7, 42, '2018-04-22 00:00:00'),(12, 'Chillhouse Coffe', 15, 1, 10, 150, '2018-02-05 00:00:00'),(13, 'Chillhouse Coffe', 11, 5, 7, 77, '2018-02-16 00:00:00'),(14, 'Chillhouse Coffe', 3, 2, 50, 150, '2018-03-05 00:00:00'),(15, 'Chillhouse Coffe', 14, 4, 7, 98, '2018-03-16 00:00:00'),(16, 'Chillhouse Coffe', 17, 3, 15, 255, '2018-04-05 00:00:00'),(17, 'Homemade Ice Cream', 9, 9, 2, 18, '2018-02-13 00:00:00'),(18, 'Homemade Ice Cream', 11, 3, 15, 165, '2018-03-08 00:00:00'),(19, 'Homemade Ice Cream', 3, 4, 7, 21, '2018-03-16 00:00:00'),(20, 'Homemade Ice Cream', 16, 2, 50, 800, '2018-04-02 00:00:00'),(21, 'Mama Bakery', 5, 10, 2, 10, '2018-02-17 00:00:00'),(22, 'Mama Bakery', 8, 6, 7, 56, '2018-03-09 00:00:00'),(23, 'Mama Bakery', 6, 7, 15, 90, '2018-03-26 00:00:00'),(24, 'Mama Bakery', 5, 6, 7, 35, '2018-04-15 00:00:00');CREATE TABLE Product(`ID` int, `Product` varchar(12), `Cost` int);INSERT INTO Product(`ID`, `Product`, `Cost`)VALUES(1, 'Cup', 10),(2, 'Box', 50),(3, 'Plate', 15),(4, 'Spoon', 7),(5, 'Knife', 7),(6, 'Fork', 7),(7, 'Paper Napkin', 15),(8, 'Ketchup', 9),(9, 'Salt', 2),(10, 'Pepper', 2);
After you've built the schema, you should get a "Schema Ready" message like the one below.
Click on the 'Browser' button:
You'll see that the above tables have been created:
To toggle back, click on DDL Editor:
You can use below button to view code for either panel in full screen.
Our objectives in this section are to:
- Collect data using standard SQL commands (Create, Select etc.).
- Filter data using basic SQL commands (Where, Order By, Distinct, Limit).
- Extract filtered data using SQL Boolean operators (AND and OR) and SQL conditional operators (=, >, <, and BETWEEN).
Let’s learn more about how we created the above tables in SQL. As we learned earlier, the left panel is used to create our schema. To create a table, you need to specify:
- The table name
- The column name
- The format for each of those column names.
CREATE TABLE Table_Name(‘Column_Name_1’ Column_Format_1, …. , ‘Column_Name_n’ Column_Format_n);
CREATE TABLE Product(`ID` int, `Product` varchar(128), `Cost` int);
This is how we created both the Product and Sales tables.
SQL Column Types
Below are the most commonly used column types in SQL:
Note that SQL keywords are case-insensitive so you may type in CREATE TABLE or Create Table, etc.
When we insert data, we specify:
- The table name
- The list of columns
- A list of rows, with each row specifying the value for each of the columns
INSERT INTO Table_Name(‘Column_Name_1’, ... , ‘Column_Name_n’)VALUES(Value for Column_Name_1, ..., Value for Column_Name_n),(Another Value for Column_Name_1, ..., Another Value for Column_Name_n),...;
INSERT INTO Product(`ID`, `Product`, `Cost`)VALUES(1, 'Cup', 10),(2, 'Box', 50),(3, 'Plate', 15),(4, 'Spoon', 7),(5, 'Knife', 7),(6, 'Fork', 7),(7, 'Paper Napkin', 15),(8, 'Ketchup', 9),(9, 'Salt', 2),(10, 'Pepper', 2);
As you can see above, when we created the Product table, we inserted 10 rows into it.
In all of the example queries that follow, simple type in the query in the Query Panel on the right and then click on "Run SQL" to see the output. The output should appear on the bottom.
Queries are structured in a straightforward way in SQL. If you want to select all data for a table in Excel, you would use the keyboard shortcut “Ctrl+A”.
To perform the same task in SQL you would use below query:
Select * from Table_Name;
Select * from Sales;
Let’s break this down.
Select indicates that you want to select data. To select all of the content of the specified table, we use an asterisk (*) in the query. To indicate where the data should be pulled from we end the query by writing from Table_Name. Finally, each query must end with a semi-colon (;) so the program knows where the code ends. Please note there should be no text after the semi-colon.
Try typing the above query into the Query panel.
Here is the result:
Next, if you wanted all the sale data for one client: Panda Restaurant. After specifying the you want all data from Sales table, you add a condition by using the where clause. In this case, you want all data where the client is Panda Restaurant.
Select * from Table_NameWhere column_name = ‘Name’;
Select * from salesWhere client_name = 'Panda Restaurant';
Next, if want to check if Panda Restaurant bought boxes in the past. If you look at the Product table, Product_ID for Box is 2. You want to select all rows where client is 'Panda Restaurant' and Product_ID is 2 for Box. Each additional condition after the first one is specified using and. You can have as many and clauses as you’d like in the query. Also note that you have quotation marks for client_name since it is a text string but none for Product_ID since it’s a number.
Select * from Table_NameWhere column_name = ‘Text’And column_name = Number;
Select * from saleswhere client_name = 'Panda Restaurant'and Product_ID = 2;
You should now have a basic understanding of how SQL queries are structured. For our example, let’s assume that you have recently joined this firm and are reviewing the database to gain some insight into their business. Below are the queries we will write to perform some routine tasks:
- List of Products
- List of All Clients
- Sales Data for April 2018
Let’s start with the first task — List of Products.
Earlier you learned that if you want the entire results of the table, you can use an asterisk *. If you want to select a particular column, then specify the column name instead of the asterisk. This will give all data from the Product column of the Product table.
SELECT ProductFROM Product;
Click ‘Run SQL’ from the Query Tools Ribbon. This should give you the list of all products.
Next, while perusing through the sales table, you decide you want a unique list of all clients. If we write a query on the Sales table like the Product table, we will get a list of all clients but it won’t be a unique list, since one client could have made multiple purchases. To get a unique list, add the word distinct before the column name. We want unique data from the client_name column from the Sales Table. This is similar to the ‘Remove Duplicates’ function in Microsoft Excel.
SELECT DISTINCT client_nameFROM sales;
Limiting the Result:
For large datasets, we have the option of limiting the number of records in the result by using the ‘LIMIT’ function. If the database has thousands of clients and you can use the limit function to only see 100. In the above example, let’s if learn how to return only 2 clients from the entire list.
SELECT DISTINCT client_nameFROM salesLIMIT 2;
While you’re looking though the database, your boss emails you that he wants you to send him all the sales data for April. To SQL or not to SQL?
Let’s write a query so you can save it now and run it every month! Reviewing the Sales Table, you see that the column used to store date of each sale is ‘sale_date’. You want all data (use *), from the sales table where date of sale is between April 1, 2018 and April 30, 2018. Let’s take it one step further and sort the dates in an ascending order. We will end the query with order by column_name to indicate which column should be used to sort the final result. By default, SQL will default to ascending order.
The date has to be enclosed in single quotation marks for the program to decipher it as a date. Refer to date formats for more information.
SELECT *FROM salesWHERE sale_date between '2018-04-01'and '2018-04-30'ORDER BY sale_date
Save this query, in the future all you have to do is change parameters for the sale_date column and you can get the results instantaneously. Isn’t it simpler than writing VBA code or performing the same tasks in Excel every month? Once you master SQL, you will save significant time writing queries.