Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Summarization Framework Aspire Tools is a set of workflow components introduced section in Aspire 5.0.3, the framework process and profiles tabular data like RDB tables or Parquet files. For each tabular document received, the framework extracts and process each of the rows and then for each row it processes each column value to generate a profile report of the whole table. At the end of the process, the data generated is added as additional metadata of the table document, it has different utilities to update components/applications, import and export the configurations of the Aspire objects (Seeds, Workflows, Credentials, Connections, Connector Instances, Policies, Schedules), also a section to upload files that will be used in the Aspire components, guided tour of Aspire 5, Aspire Javadoc, as well as,  tools to run and create groovy scripts or write and test DXF.

Easy Heading Free
navigationTitleOn this page

Sections


Debug Console


The summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column. 

Extension Manager


The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.

Resource Manager

Components

The framework is split in two kinds of components: executors and summarizers. 

Summarizers


The summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column. 

Executors

Import


The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.

How they work

For the summarizers and executors to work, they must be configured in a specific order in the workflow, with each one of the summarizers to be used added before the executor component.

Image Removed

Framework workflow steps

1. Each summarizer in the workflow attaches themselves to each document received, creating a chain of attached summarizers.
2. The executor fetches the table rows and the schema.
3. For each row of the table, the executor calls the attached summarizers.
4. The summarizers process each row received, gathering information for the table profile.
5. When all rows are processed, the summarizers return their profile to the executor.
6. The executor merges the results from all summarizers and adds them to the table document. 

Export


The summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column. 

Groovy Playground


The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.

DXF Playground


The summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column. 

Javadoc


The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.

Guided Tour


The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.Image Removed