...
Code Block | ||
---|---|---|
| ||
{"default":{ "scanRecursively": true, "stopCrawlOnScannerError": true, "filterNoCrawl": false, "azureADSeed": "" }, ... |
Multiple starting points are feature used in connectors version 4.0. Connectors version 5.0 doesn't have this feature because the functionality now is done by seeds and connection.
Two possibilities are in connectors version 4.0 to add multiple version points, by UI and by file.
By UI: User add urls by UI and we can see them in the file content-source.xml,
Code Block |
---|
# sharepoint connector
<siteCollectionsToCrawl>
<siteCollectionUrl>https://cao365.sharepoint.com/sites/qasite/davidgtest2</siteCollectionUrl>
<siteCollectionUrl>https://cao365.sharepoint.com/sites/qasite/davidgtest2</siteCollectionUrl>
</siteCollectionsToCrawl>
#smb connector
<urls>smb://10.89.26.106:445/</urls>
<urls>smb://10.89.26.105:445/</urls>
#filesystem connector
<urls>c:\tmp</urls>
<urls>c:\Users</urls>
|
Urls xml tags are not standardized for all connectors. It is created mechanism in *_transform_matrix.json to deal with it.
Code Block |
---|
# sharepoint connector
"transform":{
"conn_url_prop_name": "serverUrl",
"source_url_prop_name": "siteCollectionsToCrawl:siteCollectionUrl",
...
#smb connector
"transform":{
"conn_url_prop_name": "host",
"source_url_prop_name": "urls",
#file system connector
"transform":{
"conn_url_prop_name": "url",
"source_url_prop_name": "urls", |
We can see that are created keys "conn_url_prop_name", "source_url_prop_name" which are pointing to json connection property name and xml tag in content-source.xml.
By file:
Users add in UI only path to the file and in the file content-source.xml is xml tag
Code Block |
---|
<fileUrl>C:\tmp\ups.txt</fileUrl> |
Script read urls, split them to connection part and seed part and create them by API call.
Content by Label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...