In this post, we will learn how to create a data extract in Tableau. For this tutorial, as in previous tutorials, we use the Sales Superstore data that can be downloaded from the link given here.
First, a little primer on data extracts and their advantages. Data Extracts are simply saved subsets of data. When we create an extract of our data, we reduce the total amount of data by using filters and configuring other limits.
Data extracts are typically used in the following circumstances:
- Live connection is not possible
- Live connection is too slow
- Record volume is too large to handle
- Want to avoid exposing too much data
- Not support certain calculations
- Unable to handle complexity
- Multiple file-based sources need to be put together
- Need to materialize expressions
- Access to cloud based data sources
- Option to publish to Tableau public
There are situations however, when a Live connection is preferred such as instances where:
- Real time data is needed
- OLAP (Online Analytical Processing) sources are inaccessible
- Underlying data structure changes
- We encounter incremental loads, slowly changing dimensions, and updates to existing rows
- RAWSQL, CustomSQL
- Slow to refresh
- No user-level security
To create a data extract, we need the workbook to be created in Tableau Desktop before it can be published to Tableau Public profiles. This version is a premium and paid version of Tableau and is commonly available at subsidized rates or free for students. Check with your educational licensing departments. The procedure is simple, so follow along if using the free version too.
Open the Tableau Desktop application. As seen here, there are more file formats compatible as data sources in this version.
Connect to the Sales data and use the Orders dataset.
Notice how the top radio buttons toggle between two options. We can use either a Live connection or a data extract.
Select Extract from the options.
Click on Add button to assign Filters to the data in the extract. This will include only portion of the data as per our specified criteria.
Click on Add button. A pop-up appears.
We can make a selection based on the filtering conditions we wish to specify. Scroll through the fields and select ‘Ship Mode’.
Upon choosing Ship Mode and clicking OK, we see the values by which we can filter. We choose First Class and Same Day from the options.
Once done, click on OK. We see the filter has been assigned as a condition on the Data Source.
Click on OK and the filter is applied to the Extract and can be viewed on the data sheet as follows.
Now, we have to save this extract for creating the visualization. Even if you do try to work on the visualization by clicking on the ‘Sheet 1’, Tableau will present the pop-up to save the extract. Name the extract file with an intuitive name and save it in the folder of choice. The extension in Tableau Desktop 10.5 is ‘.hyper’. (Please note that extensions might be different based on versions and data sources.)
Another way to create extracts is via the actual visualization sheet. On the source connection, select the Live option. The data extract is made through the options within Tableau. This is the preferred method as more options are available within the worksheet.
Go to the worksheet from the tab on the bottom of this sheet. (Highlighted in the image below). From the Data tab on the header section, select the datasheet and click on the Extract option as below.
Now, a pop up window appears asking for details on which to extract the data.
The steps from here on are similar to the steps followed previously.
Click on the ‘Add’ button in Fields section. Another pop-up container appears with names of fields from the data.
Select the Ship Mode field. Include only Same Day and First Class records.
Click on OK. In this mode, we can see that additional options are available for specifying our business and data constraints. For instance, Rows can specify the limit on the number of rows to be included in the extract.
We can also see Historical data as in the previously assigned constraints.
Tableau will ask to save the extract in its data source. This is identical to the procedure explained above for saving the extract.
On completing this step, Tableau will automatically use this data extract for the viz worksheet . Verify the same by clicking on Data and selecting the worksheet as before.
Notice how the option ‘Use Extract’ is selected. The Extract option carries choices like Refresh and so on. This enables the option to refresh data in future. Whenever the data source is updated, we can automatically use the fresh version in our extract by choosing this option.
Please note that since we save our extracts with filenames, we can select specific extracts to refresh with. This also ensures that in case of multiple extracts, we can choose particular extracts.
Here, we shall create a simple visualization to see the categorization and shipment modes. Since our data is limited to only two shipping modes, the exercise will be an additional confirmation of the data extract.
Drag the fields Ship Mode, Category, Sub-Category to the Rows section. Add Sales to the column section. Not how only First Class and Same Day appear under the Ship Mode segment.
Tableau will automatically create a Horizontal bar chart with these selected values.
In the Columns section, click on down arrow next to SUM(Sales). Change the aggregation to Average.
The viz changes as follows.
Rename the sheet as Average Sales.
This also changes the title of the sheet. But, we can alter the title of the viz to be different than the name of the sheet.
Double click on the title section. A pop-up window appears that indicates that the name of the sheet is picked up by default to be the title of the viz.
Delete it and change the title as required. Here, we rename as shown below:
Click on the icon labeled ‘T’ on the top header. This will add labels to the chart.
The viz now displays the actual numeric values for average sales in each individual category.
Save this viz. As this is on Tableau Desktop, the worksheet can be saved locally as a workbook in .twb format.
Once the workbook is saved in the location of your choice, it can be run again with updated Data Extract or published to Tableau Public profile or shares as needed.
Congratulations, you just created your first data extract and successfully made a visualization from it.
However, there is still another step left in the entire process: using an updated extract. Select the data sheet from where you saved it.
Now duplicate any single record where Ship Mode is First Class. Change on the first name on the copied record. In my example, I have replicated records with order ID CA-2013-117590 and altered first name (originally Gene Hale) to Genevieve Hale and Genelia Hale in the Excel sheet.
Now, go to the data source in Tableau and select Edit Connection. Select the updated sheet from the list.
You will see that the worksheet and data is updated.
View the actual data loaded into Tableau by sorting on customer name. The newly added records should be visible.
Go to the visualization and click on data tab to refresh the source.
A message appears asking to confirm the data Refresh process.
Select ‘Yes’ to continue.
Now view the visualization. As we are only displaying average values, the figures and chart have not changed in this particular instance. But based on extracts refresh and additional business criteria that might be placed, the visualizations get altered.
Great! Now, you can successfully create, update and refresh a data extract.