Easy Heading Free

navigationTitle	On this Page
wrapNavigationText	true
navigationExpandOption	expand-all-by-default

Introduction

The Job Summarizer Executor can process the table data contained in an Aspire job and fetch the associated rows from an Elasticsearch index. Each extracted row will be processed by the summarizers attached to the job.

Job-based summarization

The Job Summarizers executor allows to summarize summarizing data based on the table structue structure contained by a job.

Example of supported table structure:

Code Block

theme	RDark

{
  "container": {
    "repItemType": "aspire/folder",
    "seed": {
      "description": "s3",
      "id": "a8c0c88a-d3b4-42fb-b27d-57137ab85154",
      "type": "s3",
      "properties": {
        "tag1": "value1",
        "seed": "/qa-s3-storage/test-level1/split container/",
        "processSplitFiles": "true",
        "usePrefixesForSplitCheck": "true",
        "splitCheckPrefix": "part-"
      },
      "tags": [
        "darwin"
      ]
    },
    "isContainer": "TYPE-NOT-PROVIDED",
    "connectorSpecific": {
      "skippedRows": "0",
      "rowCount": "32622",
      "childId": [
        "/qa-s3-storage/test-level1/split container/part-00000-d91360fd-0995-4af2-9998-39454c778297-c000.parquet",
        "/qa-s3-storage/test-level1/split container/part-00002-d91360fd-0995-4af2-9998-39454c778297-c000.parquet",
        "/qa-s3-storage/test-level1/split container/part-00001-d91360fd-0995-4af2-9998-39454c778297-c000.parquet"
      ]
    },
    "title": "split container",
    "url": "/qa-s3-storage/test-level1/split container/",
    "samples": [{
        "Column1": "text",
        "Column2": null,
        "Column3": 5,
        "Column4": "text"
        "Column5": "745286400000000"
      }, 
    ],
    "displayurl": "/qa-s3-storage/test-level1/split container/",
    "crawlStart": "2022-06-07T19:58:20Z",
    "ingestionEnd": "2022-06-07T19:58:54Z",
    "submitTime": "2022-06-07T19:58:55+0000",
    "ingestionStart": "2022-06-07T19:58:50Z",
    "dataProfile": {
      "columns": [{
          "technical_tags": "OPTIONAL",
          "nullCount": "0",
          "column_type": "STRING",
          "columnName": "Column1",
          "uniqueCount": "50"
        }, {
          "technical_tags": "OPTIONAL",
          "nullCount": "8472",
          "column_type": "STRING",
          "columnName": "Column2",
          "uniqueCount": "154"
        }, {
          "technical_tags": "OPTIONAL",
          "minValue": "0.0",
          "maxValue": "33.0",
          "meanValue": "11.41498260725533",
          "nullCount": "8472",
          "column_type": "INT32",
          "stdDev": "3.785881246274845",
          "columnName": "Column3",
          "uniqueCount": "30"
        }, {
          "technical_tags": "OPTIONAL",
          "nullCount": "0",
          "column_type": "STRING",
          "columnName": "Column4",
          "uniqueCount": "3"
        }, {
          "technical_tags": [
            "OPTIONAL",
            "AdjustedToUTC",
            "MICROS"
          ],
          "column_type": "TIMESTAMP",
          "columnName": "Column5"
        }
      ]
    }
  },
  "name": "data-container"
}

The table structure must contain information regarding the columns, such as the type and name.

Fetch rows from Elasticsearch

The table rows are extracted from an Elasticsearch index, there are two formats of supported rows:

Based on published unique values:

Code Block

theme	RDark

{
  "name": "column-value",
  "value": {
    "pctg": "0.04966430607927895",
    "seedId": "a8c0c88a-d3b4-42fb-b27d-57137ab85154",
    "count": "1620",
    tableId "": "/qa-s3-storage/test-level1/split container/",
    "value": "text",
    "columnName": "Column1"
  }
}

Single level key-value objects:

Code Block

theme	RDark

{
  "Column1": "text",
  "Column2": null,
  "Column3": 5,
  "Column4": "text"
  "Column5": "745286400000000"
}

Rows Filtering

The Job Summarizer Executor has the option to configure a groovy script to filter which rows will be processed.

Example:

Code Block

theme	RDark
title	Row Filter

// This script must return a boolean.
// The references of the job, doc, component, row and table objects are available.
// Javadoc references 
// Row (row) - http://{manager}/javadocs/com/accenture/aspire/services/summarization/Row.html
// Table (table) - http://{manager}/javadocs/com/accenture/aspire/services/summarization/Table.html
row.getBoolean("sensitive") == true

Page tree

Versions Compared

Old Version 6

New Version Current

Key

Introduction

Job-based summarization

Fetch rows from Elasticsearch

Rows Filtering

Page tree

Page History

Versions Compared

Old Version 6

New Version Current

Key

Introduction

Job-based summarization

Fetch rows from Elasticsearch

Rows Filtering