Page History

Mappings has been changed for the index aspire-identitycache in version 5.2. The action you must take depends on the need to preserve existing data:

No need to preserve existing data
Convert existing data

In all code blocks below Kibana is expected to be used for sending requests to Elasticsearch.

No need to preserve existing data

In that case simply delete existing index after stopping Aspire. When you start Aspire 5.2 version the index will be automatically created with the existing mapping.

Code Block
DELETE aspire-identitycache

Convert existing data

create new index aspire-identitycache_new with new 5.2 mapping:

Code Block
PUT aspire-identitycache_new { "mappings": { "properties": { "identity.seedId": { "type": "keyword" }, "identity.timestamp": { "type": "keyword" } }, "dynamic_templates": [ { "all_as_keyword": {

Connector template migration script is designed and written in python 3.11.

The connector script helps to migrate connectors from Aspire 4.0 to Aspire 5.0.

The script is designed to be robust and easily extensible using templates with transformation matrices.

It can currently migrate the following connector types: Filesystem, SMB, SharePoint online.

The workflows are now ignored and not migrated.

There are implemented these features:

ability to change version of template
ability passing connectionId, connectorId, workflowId, policyId to seed, connection, credential
the ability passing list of tags
default values for properties which are not in 4.0.
multiple Starting Points -> split to multiple seeds and connections, script read file and for each line create seed.
ability to splitting url to connection and seed.
ability to pass as parameter a suffix of http object description
ability to work with relative path
ability to use the crawl path/folder to the Seed name and host/url to the Connection name the by using an argument.

Repository

aspire-migration-component

Installation

For Developers

Install Python 3.11
Install Pycharm Comunity

In Pycharm go to File → Settings → Python Interpreter

Setup path to where is python

Install missing libraries urllib3, wheel, xmltodict, orjson by "+" button

Windows cmd

install Python 3.11

Update Path in environment variables:

C:\Users\#{username}\AppData\Local\Programs\Python\Python311\

C:\Users\#{username}\AppData\Local\Programs\Python\Python311\Scripts

install pip

...

install virtualenv

pip install virtualenv --trusted-host pypi.org

...

virtualenv venv

Activate virtual env

...

Install necessary libraries

...

Python libraries used

os.path, json, urlparse, xmltodict, urllib3, zipfile, sys, datetime, logging, argparse

Step-by-step guide

Run Aspire 4.0 and Aspire 5.0
Download zip file of connector from Aspire 4.0
setup Aspire 5.0 url in file __init__.py, if you are using private certificates, needs to be setup also certificates

Code Block

URL_ASPIRE = "http://localhost:50505/aspire/_api" 
# --- https with certificates
# CA_CERTS = ('client-2048.crt', 'client-2048.key')
# http = urllib3.PoolManager(cert_reqs='REQUIRED', ca_certs=CA_CERTS)

4. Run script

Code Block
python __init__.py -p smb_connector-content-source.zip

5. After few second connector is migrated to Aspire 5.0, objects "seed", "connection", "connector", "credential" created with description "migration-{date}".

You can also check the created report about migration "migration-{date}.log" which will be in the directory "reports" to see what exactly happened.

Help

We can get information about script by command

Code Block
python __init__.py -h

Output

...

          "match_mapping_type": "*",

...

Ability to change version of template

Templates are in directory "migration_template"

The structure of the project looks similarily as an situation below.

-- migration

-- migration_template

-- 4.1

-- default

-- aspire-filesystem-source

-- aspire-sharepointonline-source

-- aspire-smb-source

-- type_transform_matrix.json

Connectors can have different properties related to their version 4.0, 4.1, 5.0, 5.1 etc. When we want to work with different version than "default", we need to create new directory there "4.1" everything from directory "default" needs to be copied to this new directory. Changes needs to be done in "*_transform_matrix.json" for current connector.

After we are done with changes, script must be run with other parameter "-v [name of new directory]"

Code Block
python __init.py__ -v 4.1 -p smb_connector-content-source.zip

Ability passing ids to seed, connection, credential

Can be the situation that we have some objects already created in Aspire 5 and we want to use them for this new migrated connector. So we create json file with the structure related to example below.

File "ids.json" content:

...

"path_match": "identity.attributes.*",
            "mapping": {
              "type": "keyword"
            }
          }
        },
        {
          "date_to_string": {
            "match_mapping_type": "date",

...

"

...

mapping":

...

{

...

"

...

type":

...

"text"

...

}

...

}

...

},

...

{

...

"boolean_to_string": {

...

"match_mapping_type":

...

"boolean", "

...

mapping":

...

{

...

"type": "

...

text"

...

}

...

}

...

This content of the file is only to show, what all of ids can be setup and in the real situation "connection" in "Seed" must be deleted if we want to also setup connection and same for the situation credential in connection. Because these objects will be replaced and not created in script.

After we are done with file, script must be run with other parameter "-i [filepath]"

Code Block
python __init.py__ -i ids.json -p smb_connector-content-source.zip

Output

...


        },
        {
          "long_to_string": {
            "match_mapping_type": "long",
            "mapping": {

...

"

...

type": "

...

text"

...

}

...

} }, {

...

"

...

double_to_string":

...

{

...

"

...

match_mapping_type": "

...

double",

...

"

...

mapping":

...

{

...

Ability to passing list of tags

Plase see the article below about tags.

Routing Policies - Aspire 5.0 - Confluence (accenture.com)

Script must be run with other parameter "-t [tag1, tag2, ..]"

...

Output

...

"type": "

...

text"

...

} } },

...

{

...

Default values for properties which are not in 4.0.

Default values are part of *-transform-matrix.json and can be added for seed, connection, connector, credential.

Code Block

title	connection_transform_matrix.json

{"default":{
  "scanRecursively": true,
  "stopCrawlOnScannerError": true,
  "filterNoCrawl": false,
  "azureADSeed": ""
},

...

Multiple Starting Points

Multiple starting points are feature used in connectors version 4.0. Connectors version 5.0 doesn't have this feature because the functionality now is done by seeds and connection.

Two possibilities are in connectors version 4.0 to add multiple version points, by UI and by file.

By UI: User add urls by UI and we can see them in the file content-source.xml,

Code Block

# sharepoint connector
<siteCollectionsToCrawl>
    <siteCollectionUrl>https://cao365.sharepoint.com/sites/qasite/davidgtest2</siteCollectionUrl>
    <siteCollectionUrl>https://cao365.sharepoint.com/sites/qasite/davidgtest</siteCollectionUrl>
</siteCollectionsToCrawl>

#smb connector
<urls>smb://10.89.26.106:445/</urls>
<urls>smb://10.89.26.105:445/</urls>

#filesystem connector
<urls>c:\tmp</urls>
<urls>c:\Users</urls>

Urls xml tags are not standardized for all connectors. It is created mechanism in connection_transform_matrix.json to deal with it.

Code Block

# sharepoint connector 
"transform":{
  "conn_url_prop_name": "serverUrl",
  "source_url_prop_name": "siteCollectionsToCrawl:siteCollectionUrl",
...

#smb connector
"transform":{
  "conn_url_prop_name": "host",
  "source_url_prop_name": "urls",
..

#file system connector
"transform":{
  "conn_url_prop_name": "url",
  "source_url_prop_name": "urls",
..

We can see that are created keys "conn_url_prop_name", "source_url_prop_name" which are pointing to json connection property name and xml tag in content-source.xml.

By file:

Users add in UI only path to the txt file and in the file content-source.xml is file path xml tag.

Code Block
# sharepoint connector <seedsFilePath>${aspire.config.dir}/${app.name}/urls.txt</seedsFilePath> #smb connector <fileUrl>C:\tmp\ups.txt</fileUrl> #filesystem connector <fileUrl>C:\tmp\ups.txt</fileUrl>

The file path xml tag is not standardized for all connectors, so we created similar mechanism as for urls to deal with that.

Code Block

# sharepoint connector
"transform":{
  "source_fileurl_prop_name": "seedsFilePath",
...
#smb connector
"transform":{
  "source_fileurl_prop_name": "fileUrl",
...
#filesystem connector
"transform":{
  "source_fileurl_prop_name": "fileUrl",
...

Script read urls from "content-source.xml" file or txt file and create several connections and seeds by HTTP API call.

Ability to splitting url to the connection and seed

Script read an url splits it to connection part and seed part and create the connection and the seed.

Concretely Filesystem url C:\tmp/abc will be splitted to drive part C:\ (connection) and directory part /tmp/abc (seed).

Sharepoint url https://cao365.sharepoint.com/sites/qasite/davidgtest will be splitted to scheme part https://, hostname part cao365.sharepoint.com, path part /sites/qasite/davidgtest,

scheme + hostname will be in the connection, path part will be in the seed.

Almost same will be splitted smb url smb://10.89.26.106:445/, only we don't need scheme in this situation so this will be not used. To control usage of scheme, we need to setup in file connection_transform_matrix.json property

"url_scheme_included":"false".

Ability to pass as parameter a suffix of http object description

Script must be run with other parameter "-d [migr]"

Code Block
python __init.py__ -d migr -p smb_connector-content-source.zip

Ability to work with relative path

Migration xml files can contain relative path to other configuration files. Script can change automatically relative path to absolute path but must be run with parameter "-a [absolute path to aspire 4.0 distribution]"

Code Block
python __init.py__ -a C:\aspire-4.0 -p smb_connector-content-source.zip

Ability to use the crawl path/folder to the Seed name and host/url to the Connection name the by using an argument

Script can change description to path/folder, host/url by run with parameter "-u"

...

 "binary_to_string": {
            "match_mapping_type": "binary",
            "mapping": {
              "type": "text"
            }
          }
        }
      ]
    }
  }

Page tree

Versions Compared

Old Version 1

New Version 2

Key

No need to preserve existing data

Convert existing data

Repository

Installation

For Developers

Windows cmd

Python libraries used

Step-by-step guide

Help

Ability to change version of template

Ability passing ids to seed, connection, credential

Ability to passing list of tags

Default values for properties which are not in 4.0.

Multiple Starting Points

Ability to splitting url to the connection and seed

Ability to pass as parameter a suffix of http object description

Ability to work with relative path

Ability to use the crawl path/folder to the Seed name and host/url to the Connection name the by using an argument