The Data Manager makes up a substantial part of the analysis document. It holds and handles data. It is the top node and entry point to data management.
Representing Data
Data is represented by data tables. Each table in the table node collection
has a unique and fixed id as well as a unique but editable name. A particular node
can be retrieved both by name and id. The data tables are kept in memory, but if
memory is scarce the least used data can be paged to disc.
Tables come in several flavors:
- Source tables are initiated and populated once by reading from a data source.
- On demand tables are created on the fly based on an analytic action. They
typically retrieve additional data from an external database using a query defined
by, for instance, the selection in a visualization. The life time of the table is
usually the life time of the artifact it supports, such as a details-on-demand view.
- Calculated tables are derived from source tables for a specific purpose.
When it has been created, it exists until it is removed. It may have to be manually
updated to correctly reflect a change in the underlying data. A calculated table
may contain data describing the relationships between columns of a source table.
- Derived tables are created from source tables. They define dynamic views
on the source data, typically an aggregation of a filtered subset of data. They
are automatically updated, but are not accessible from the data manager and are
usually not persisted.
A data column consists of metadata and row values. As indicated in the figure,
there is a document node for each column. The data type of a column is Integer,
Real, String, Date, Time,
DateTime, Currency, or Binary. The data representation
is column-based in the sense that data values are associated with columns. The actual
data values however, are stored in specialized internal data structures.
See
Spotfire Data
for further class references and code samples.
Loading Data
When an analysis file is saved to disk, the data for a table can be stored directly
in the file. This mode is known as embedded data. Alternatively, data can
be reloaded from the external data source when the file is opened. This mode is
known as linked data. Metadata about the table and its columns is stored
in the file for both embedded and linked data.
-
Spotfire Text Data Format
The Spotfire Text Data Format (STDF) is a tabular data format, the common file format for Spotfire products. It is strict, unforgiving, easy to parse efficiently, and particularly useful if data is both formatted and parsed by Spotfire products. Otherwise, a more flexible format might be preferable.
A data source is an object that contains a reference to external data, such
as a file or database. There are many kinds of data sources custom data sources
may be created. Furtermore, a data source may be included in a data flow,
a pipeline that reads data from a data source and processes the data through a sequence
of transformations. The output usually ends up in a data table. A data transformation
can perform anything from simple tasks, like data cleaning, to complex operations,
like pivoting. It is possible to extend the platform by creating custom transformations.
Handling Data
A data relation defines a connection between two tables, usually by declaring
that a column in the first table corresponds to a column in the second table. Data
relations are used for translating a row selection in one table to a row selection
in the other table. The set of all data relations implicitly define a grouping of
the tables. All tables in such a table group are related to each other.
A row selection defines a subset of the rows in a table. Multiple row selections
for the same table can be combined using set operations such as taking the intersection.
A data selection is conceptually similar to a row selection, but it is more
complex since it is applicable to all the tables. You can set the selection relative
to one table and then retrieve the selection translated to another table. There
are two specific types of data selections. A marking selection defines a set of
marked rows, while a filtering selection defines a set of filtered rows. The two
types turn out to be more different than one might expect at first.
A data view is a table that is derived from another table, typically by
aggregating the rows. Data views are used extensively by the visualization framework.