Create Connector

Field	Optional	Default	Multiple	Notes	Example
type	No	-	No	The value must be the same as the type of the seed that will use this connector.	"filesystem"
description	No	-	No	Name of the connector object.	"MyFileSystemConnector"
artifact	No	-	No	The mvn coordinates of the connector.	"com.accenture.aspire:aspire-filesystem-source"
properties	No	-	No	Configuration object
debug	Yes	false	No	Enables the debug messages	true / false
wDebug	Yes	false	No	Enables job logging.	true / false
enableStatistics	Yes	false	No	Enable to gather pipeline job statistics in the debug console	true / false
infoCacheSize	No	100	No	The size of the Source Info cache used by the connector	200
mapCacheSize	No	100	No	The number of Storage maps kept in memory per seed	200
setCacheSize	No	100	No	The number of Sets kept in memory per seed	200
identityCacheSize	No	100	No	The number of identities kept in memory per seed.	200
enableFetcher	No	true	No	Enables document fetching for the seeds that use this connector.	true / false
enableTextExtract	No	true	No	Enables text extraction. By default, connectors use Apache Tika to extract text from downloaded documents. To apply special text processing to a downloaded document in the workflow, disable text extraction. The downloaded document is then available as a content stream.	true / false
extractTextMaxSize	No	20971520	No	Maximum extract text size in number of characters or \"unlimited\". Doesn't apply if HTML Output option is enabled.	10000
extractTimeout	No	180000	No	Maximum time (in ms) to wait for the text extraction.	18000
xmlMaxDepth	No	2147483647	No	The max depth level for a file inner structure. Can be used to block denial of service attacks or corrupted files.	2147483647
structuredText	No	false	No	Include formatting in output (in HTML) instead of plain text.	true / false
tikaConfig	No	-	No	Path for Apache Tika configuration file. It can be passed as empty to use the default configuration	"/path/to/tikaConfig.xml" / ""
pdfParserProperties	No	false	No	Enable to change the default PDFBox properties	true / false
enableAutoSpace	No	true	No	If enabled, the parser should estimate where spaces should be inserted between words. For many PDFs this is necessary as they do not include explicit whitespace characters.	true / false
suppressDuplicateOverlappingText	No	false	No	If enabled the parser should try to remove duplicated text over the same region. This is needed for some PDFs that achieve bolding by re-writing the same text in the same area. Note that this can slow down extraction substantially (PDFBOX-956) and sometimes remove characters that were not in fact duplicated (PDFBOX-1155)	true / false
extractAnnotationText	No	true	No	If enabled, text in annotations will be extracted.	true / false
sortByPosition	No	false	No	If enabled, sort text tokens by their x/y position before extracting text. This may be necessary for some PDFs (if the text tokens are not rendered \"in order\"), while for other PDFs it can produce the wrong result (for example if there are 2 columns, the text will be interleaved).	true / false
extractAcroFormContent	No	true	No	If enabled, extract content from AcroForms at the end of the document	true / false
extractInlineImages	No	false	No	If enabled, extract inline embedded OBXImages. Beware: some PDF documents of modest size (~4MB) can contain thousands of embedded images totaling > 2.5 GB. Also, at least as of PDFBox 1.8.5, there can be surprisingly large memory consumption and/or out of memory errors. Set to true with caution.	true / false
extractUniqueInlineImagesOnly	No	true	No	Multiple pages within a PDF file might refer to the same underlying image. If extractUniqueInlineImagesOnly is set to false, the parser will call the EmbeddedExtractor each time the image appears on a page. This might be desired for some use cases. However, to avoid duplication of extracted images, set this to true.	true / false
enable-non-text-filter	No	false	No	Enable to filter non text documents.	true / false
enableFetchForNonText	No	true	No	Enable if the workflow needs to stream the non-text documents.	true / false
non-text-document	No	false	No	Enable to filter using document extensions, disable to	true / false
nonTextDocumentsExtensions	No	jpg,jpeg,gif,png,tif, mp3,mp4,mpg,mpeg, avi,mkv,wav,bmp,swf, war,rar,tgz,dll,exe,class	No	Comma separated list of non-text document extensions. Used based on the non-text-document value	"jpg,jpeg,gif,png"
nonTextDocuments	No	-	No	Path to a file containing a list of regex that matches the non-text documents, one regex expression per line. Used based on the non-text-document value	"config/nonTextDocuments.txt"
metadataMap	Yes	[ ]	Yes	Settings for mapping extracted fields to a destination field.
from	No	-	No	Field to be mapped	"fieldA"
to	No	-	No	Field where the value will be mapped	"fieldB"
entity	Yes	"user"	No	Entity (user / group) represented by the static ACL.	"user" / "group"
access	Yes	"allow"	No	Access (allow / deny) granted by the ACL.	"allow" / "deny"

Example

$action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

Page tree

Create Connector

Example

Contact Us: [email protected]

Page tree

Rest API - Connectors Features

Create Connector

Example

Contact Us: [email protected]