Step 1. Download Saga Parser
Step 2. Copy the .jar files to Aspire
Step 3. Configure the settings.json of Aspire 5
"bundleVersions": { "bundle": [ { "@artifactId": "app-saga-parser", "@groupId": "com.accenture.aspire", "@version": "5.0.3.132100" }, { "@artifactId": "aspire-saga-parser", "@groupId": "com.accenture.aspire", "@version": "5.0.3.132100" } ] }
Step 4. Add Saga Config in Aspire 5
Create config.json
{ "config": { "libraryJars": ["./lib"], "tagManager": { "resource": "saga-provider:saga_tags" }, "pipelineManager": { "resource": "saga-provider:saga_pipelines" }, "providers": [ { "name": "filesystem-provider", "type": "FileSystem", "baseDir": "./config" }, { "name": "saga-provider", "type": "Elastic", "nodeUrls": ["http://localhost:9200"], "timestamp": "updatedAt", "authentication": "none", "indexName": "saga", "exclude": [ "updatedAt", "createdAt" ], "maxResults": 2000000 } ] } }
Step 5. Run Aspire 5
Step 6. Add Saga Parser in the Extension Manager
Type name - Name of the Extension
Extension type - Choose application
Maven Coordinates - com.accenture.aspire:app-saga-parser:{Saga Parser Version}
Step 7. Add Saga Parser to your workflow
Step 8. Configure Saga Parser on Aspire
Config Path: Location of the config.json downloaded earlier
Create Python Bridge per engine: Option to create and start a python bridge PER SAGA engine used.
Python Bridge path: Folder path to the python bridge you want to spawn (it MUST have the venv created and with all the requirements installed).
Match Type: Type of SAGA output match (Match Extraction or Analytics).
Match Extraction: This response type returns an array with all the Sematic Tags matches.
Analytics: This response type returns an array with any non Token matches.
Process fields: Path of the content you want to process inside the AspireObject.
Engine Pool Size: Number of SAGA engines.
Create Engines Beforehand: Create the Engines BEFORE crawling instead at the time of actual cralws.
Tags/Processors: Select if you want to use SAGA tags or a specific Processor (pipeline stage).
Tags: List of SAGA tags you want to process. It needs to have at least ONE tag.
Use Exact Tags: If you want to use the exact names of tags (If you use container tags, probably you want to disable this).
Processor: Specific processor you want to process from a pipeline.
Include Flags: The Name of the Flags that you want to use. By default is SEMANTIC_TAG and this option cannot be empty.
Exclude Flags: Flags you want to skip and not add to the final output.
Cache Results: Enabling this will cache the most used results to improve performance.
Debug: Enable debug log messages.
Step 9. Save the configuration