The summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column.
The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.
The framework is split in two kinds of components: executors and summarizers.
SummarizersThe summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column.
ExecutorsThe executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.
For the summarizers and executors to work, they must be configured in a specific order in the workflow, with each one of the summarizers to be used added before the executor component.
1. Each summarizer in the workflow attaches themselves to each document received, creating a chain of attached summarizers.
2. The executor fetches the table rows and the schema.
3. For each row of the table, the executor calls the attached summarizers.
4. The summarizers process each row received, gathering information for the table profile.
5. When all rows are processed, the summarizers return their profile to the executor.
6. The executor merges the results from all summarizers and adds them to the table document.
The summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column.
The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.
The summarizers are the components that process each of the rows and columns, creating the data profile that will be added to the table document at the end of the process. Each summarizer is specialized to gather different kinds of information. The information could be samples of the processed data or statistics like what is the minimum or maximum value for a numerical column.
The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.
The executors are the components that know how to extract the rows from the table document and the schema of how the table is structured depending on the document type (RDB, Parquet, SAS, etc.) For each extracted row, the executor calls each of the configured summarizers.