As we’ve learned from the last section, within data lies valuable information that can be used to draw conclusions and drive relevant decisions. To get the most value out of your data, it’s important to prep it prior to starting any analysis.
When preparing data for analysis, the main goal is to structure it in a way that makes your subsequent analysis more efficient.
Generally, this includes making sure that:
- Each valuable variable to be measured has it's own, unique column
- Each of said variable columns in one standard data format
- A unique identifier for each row entry — meaning that at least one column has values that are unique to each row.
This last piece about a unique identifier becomes especially important when preparing the data for consumptions by other software, such as SQL or Tableau. Additionally, this is crucial for any in-depth analysis or use of structured, complex Excel formulas that reference multiple data tables.
The following basic structural logic applies to any data set to be used for analysis:
Each row entry contains at least one unique column value that that distinguishes it from the rest of the set
Let’s assume the desired result is a bar chart representing total sales, rounded to the nearest dollar, for each of the 12 months.
The data structure will require a column for Month, along with a column representing the Monthly Total Sales for each of the 12 months.
Shortcut Tip: Type out “January” in the first cell, and hover over the lower right corner of the highlighted cell until a four-point cursor comes up. Drag down until you see “December” and release
For a list of Excel mouse pointers and shortcut commands, see more here: Mouse Pointes & Cursor Movement Guide.
In this step, we will be formatting our data points to round them to the nearest dollar amount.
Find a drop-down menu with a “General” value – this is usually the standard selection. Look for the blue arrow in the image below.
Click the down arrow to bring up a short list of available formats.
However, the short drop down does not include our needed format. Luckily, we can create custom formats and find more obscure preset formats by navigating to the format drop-down menu, and selecting “More Number Formats”
This brings up the Format Cells ribbon, which is divided into several sections representing different format groupings, such as:
General: Standard format, just as the data was entered; No specific format applied
Number: Integers; Just rounded or with certain amount of decimal points
Allows us to choose the number of decimal places, the way negative values are displayed, and whether a 1000 comma separator is applied
Currency: Currency symbols is added to standard number format
Allows us to choose the amount of decimal places, which currency symbol to apply, and how to display negative values
Accounting: Accounting formats line up the currency symbols and the decimal points in a column
Date: Several presets for Weekday, Day, Month, and Year display. Allows to choose between global regions
Time: Several presets for displaying time. Allows to choose between global regions
Percentage: Displays data as a percentage. Allows to choose amount of decimal places
Fraction: Displays data as a fraction. Allows to choose method of rounding the remainder
Scientific: Displays data as an exponent. Allows to choose amount of decimal places
Text: Cell is displayed exactly as entered. Numbers are treated as text
Special: Includes special formats that are useful, such as Zip Code and SSN
Custom: Allows custom format to be entered
For a guide on custom format structure, visit: Custom Excel Number Format Guide
Within the Format Cells ribbon, Navigate to the Currency Section – these sets of format presents add the currency sign to our data. The default currency preset will round to 2 decimals, but we want our totals rounded to the nearest dollar.
Click the down arrow twice to bring the Decimal Places selection to 0, and click “OK” at the bottom of the ribbon.
Having your data in uniform format is important for functionality and efficiency, especially when using formulas.
In this case, each Month value in the Month column serves as the unique identifier for each Total Sales value.
The resulting data has unique identifiers and is in uniform format – it is now prepped for analysis.
In our previous example, we assumed that the structured data set is created from scratch. But what if we’re working with an existing data set. Let’s now assume that the provided data set looks like this:
Although we will still be able to graph this data, it is not optimally structured for presentation or further analysis. You’ll notice that we have a few issues at hand:
- 12 variable fields and only 1 measure across those 12 variables
- Variable fields are out of order
- Format doesn’t reflect rounded to the nearest dollar
To fix this, follow these steps:
- Stack rows for Total Sales in one column, side by side with their corresponding Month
- Create designated “Month” column for the stacked months
- Select Total Sales values, navigate to the “Numbers” section of the “Home” ribbon
- Select “More Number Formats”, “Currency” and slide Decimal Places down to 0
- Click “OK”