Skip to main content
RSS feed Subscribe to feed

 

How to Automate a Data Transformation

This example project implements a custom document tool that opens an information link and performs two transformations, a predefined pivot followed by a predefined unpivot.

Overview

A data transformation is used when data needs to be transformed before it is loaded into Spotfire. The import process can be automated in a Spotfire extension. This document tool automates the process of transforming data loaded using an information link as data source.

Step-wise data transformation flow

The data will first be transformed with a pivot transformation and then an unpivot transformation will be applied. The settings for the transformations are based on metadata for the columns.

The graphics in this example is based on the following actions using the demo information links, which require an online connection to the server:

  1. Select File > Open From > Information Link... and open the Film information link.
  2. Select Tools > Transformation Tool Example.
    Running the Transformation Tool Example
Background Information
  • Creating a Tool
    A tool performs an analytic action. It operates in the context it is defined for.
  • Creating a Transformation
    Transformations are applied to data when loading it into Spotfire, shaping the data to the desired form before analyzing it.
Prerequisites
  • Spotfire SDK\Examples\Extensions\SpotfireDeveloper.TransformationToolExample

The Transformation Automation Example

The tool is implemented in the SpotfireDeveloper.TransformationToolExample\TransformationTool.cs file. The ExecuteCore method performs the following actions:

Setting Up Progress

To give the user an indication that the tool is running the tool will be executed with a progress window. The execution with progress is created from a progress service.

  1. Retrieve progress service.
    The progress service is retrieved from a GetService<ProgressService> request.
  2. Start the execution.
    The execution is started with a title and a description that will be shown in the progress window. The progress information can be updated using subtasks.
  3. Retrieve current progress.
    To execute a subtask the current progress must be retrieved. From the reference to the current progress subtasks can be executed and cancelation of the execution can be checked.
  4. Execute subtask.
    From the current progress a subtask is executed with the information Creating data source.

Creating the Data Source

The data can be retrieved from different types of data sources based on the data format.

  1. Create an information link data source.
    The InformationLinkDataSource loads data from the library. An id for the link must be specified when the information link is created.
  2. Connect to the data source.
    After the data source is created a connection must be established. If the import requires settings from the user, prompting must be allowed for the connection. Here the prompting is suppressed with the parameter DataSourcePromptMode.None.
  3. Read from the data source.
    When the connection is established the query for the connection must be executed. From the execution a dataset is returned. The data and metadata from the data source can now be accessed from the dataset using a DataRowReader.

Creating the Data Flow

The flow from data source to transformed data is represented by a data flow. A data flow is constructed by a data flow builder. The data flow builder starts with the data source and adds the transformations.

  1. Create a data flow builder.
    When the DataFlowBuilder is created, a data source must be supplied.
    Next any number of transformation steps may be added to the DataFlow.
  2. Start new subtask.
    A new subtask is executed with the description Creating pivot transform.

Creating the Pivot Transformation

A data transformation may require settings on how to transform the data: The settings can be retrieved from the user by requiring prompting when the transformation is executed. If the settings are configured in the code or no user settings are required for the transformation, the prompting for the transformation can be suppressed at execution of the transform.

  1. Create the pivot transformation.
    The PivotTransformation transformation reshapes tall-skinny data into short-wide data.
  2. Assign identity columns.
    The identify columns of the pivot transformation are passed through the transformation and identifies each row of the result data. The columns can be specified directly by a given name or index in the dataset. The columns can also be specified by searching in the metadata for the columns. Here columns that have gender or group in the name are selected.
  3. Assign category columns.
    There will be an aggregated result data column for each distinct combined value of the category columns. The category columns are here found by searching the metadata for columns with location in their names.
  4. Assign value columns.
    Value Column data are aggregated over identity columns and the category columns. The aggregation method must be specified for each column. The value columns are defined as columns of the type real where the name does not contain age, average or total.
    In this example all the value column have the aggregation method Sum. Examples of other aggregation methods are average (Avg), standard deviation (StdDev), and median (Median).
  5. Add the pivot transformation to the data flow.
    When the pivot transformation is defined it must be added to the DataFlowBuilder. The transformation must be added in the order they are intended to be applied. Here the pivot transformation will be the applied first to the source data.
  6. Execute the pivot transformation.
    To get a result from a part of the transformation chain the transformations must be executed from the DataFlowBuilder. The execution of the pivot transformation gives a result dataset. The dataset will here be used to define the next transformation (unpivot transform).
  7. Check for progress cancelation.
    If the progress is canceled the execution shall be stopped. The current progress is used to check if the progress is canceled.
  8. Start new subtask.
    A new subtask is executed with the description Creating unpivot transform.

Creating the Unpivot Transformation

The unpivot transformation reshapes data into tall-skinny format. All settings for the unpivot transformation are configured in the code. The settings are based on the metadata from the result of the pivot transformation.

  1. Create the unpivot transformation.
    The UnpivotTransformation transformation reshapes short-wide data into tall-skinny data.
  2. Assign pass through columns.
    Pass through columns are passed through and repeated for each unpivoted value. The columns that will pass through in this example are all columns with names that contain age or gender.
  3. Assign value columns.
    Value Column data are merged into one result column of data. Column names are returned in a category column. If this is not set the unpivot transformation will be invalid and raise an exception. An empty collection of ValueColumn should give an empty result. Here all columns of type real with a name not containing age will be selected as value columns.
  4. Add unpivot transformation to the data flow.
    The unpivot transformation is defined and must be added to the DataFlowBuilder. The data flow now consists of the data source, the pivot transformation and the unpivot transform.
  5. Check for progress cancelation,
    If the progress is canceled the execution shall be stopped. The current progress is used to check if the progress is canceled.
  6. Start new subtask.
    A new subtask is executed with the description Building data flow.

Building the Data Flow

When all transformations have been added to the data flow builder, the data flow logically is a data source.

Building the transformation chain
  1. Build the data flow.
    The DataFlowBuilder.Build method returns the DataFlow.
  2. Check for progress cancelation.
    If the progress is canceled the execution shall be stopped. The current progress is used to check if the progress is canceled.

Adding Data

The data flow is used as the data source to create a new table in Spotfire:

  1. Create a data table from the data flow.
    Since the context is the document, the Tables collection is easily accessible: context.Data.Tables.
    Resulting table
  2. Add a new page.
    A new page with the title Result of Transformation Tool is added to view the result of the transformation and the page is set to be the active page of the document.
  3. Create a table view.
    To view the result of the data flow a table plot is added to the result page.
  4. Create a bar chart.
    Create and add a bar chart to the result page.
  5. Configure the bar chart.
    Configure the bar chart to show the sum of values for each category in the unpivoted data.
Resulting analysis