You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Each Seed entity requires a reference to a Connector in order to be created. This page details how to create Connector using the Rest API

On this page

Create Connector


FieldOptionalDefaultMultipleNotesExample
typeNo-NoThe value must be the same as the type of the seed that will use this connector."filesystem"
descriptionNo-NoName of the connector object."MyFileSystemConnector"
artifactNo-NoThe mvn coordinates of the connector."com.accenture.aspire:aspire-filesystem-source"
propertiesNo-NoConfiguration object
debugYesfalseNoEnables the debug messagestrue / false
wDebugYesfalseNoEnables job logging.true / false
enableStatisticsYesfalseNoEnable to gather pipeline job statistics in the debug consoletrue / false
infoCacheSizeNo100NoThe size of the Source Info cache used by the connector200
mapCacheSizeNo100NoThe number of Storage maps kept in memory per seed200
setCacheSizeNo100NoThe number of Sets kept in memory per seed200
identityCacheSizeNo100NoThe number of identities kept in memory per seed.200
enableFetcherNotrueNoEnables document fetching for the seeds that use this connector.true / false
enableTextExtractNotrueNoEnables text extraction. By default, connectors use Apache Tika to extract text from downloaded documents. To apply special text processing to a downloaded document in the workflow, disable text extraction. The downloaded document is then available as a content stream.true / false
extractTextMaxSizeNo20971520NoMaximum extract text size in number of characters or \"unlimited\". Doesn't apply if HTML Output option is enabled.10000
extractTimeoutNo180000NoMaximum time (in ms) to wait for the text extraction.18000
xmlMaxDepthNo2147483647NoThe max depth level for a file inner structure. Can be used to block denial of service attacks or corrupted files.2147483647
structuredTextNofalseNoInclude formatting in output (in HTML) instead of plain text.true / false
tikaConfigNo-NoPath for Apache Tika configuration file. It can be passed as empty to use the default configuration"/path/to/tikaConfig.xml" / ""
pdfParserPropertiesNofalseNoEnable to change the default PDFBox propertiestrue / false
enableAutoSpaceNotrueNoIf enabled, the parser should estimate where spaces should be inserted between words. For many PDFs this is necessary as they do not include explicit whitespace characters.true / false
suppressDuplicateOverlappingTextNofalseNoIf enabled the parser should try to remove duplicated text over the same region. This is needed for some PDFs that achieve bolding by re-writing the same text in the same area. Note that this can slow down extraction substantially (PDFBOX-956) and sometimes remove characters that were not in fact duplicated (PDFBOX-1155)true / false
extractAnnotationTextNotrueNoIf enabled, text in annotations will be extracted.true / false
sortByPositionNofalseNoIf enabled, sort text tokens by their x/y position before extracting text. This may be necessary for some PDFs (if the text tokens are not rendered \"in order\"), while for other PDFs it can produce the wrong result (for example if there are 2 columns, the text will be interleaved).true / false
extractAcroFormContentNotrueNoIf enabled, extract content from AcroForms at the end of the documenttrue / false
extractInlineImagesNofalseNoIf enabled, extract inline embedded OBXImages. Beware: some PDF documents of modest size (~4MB) can contain thousands of embedded images totaling > 2.5 GB. Also, at least as of PDFBox 1.8.5, there can be surprisingly large memory consumption and/or out of memory errors. Set to true with caution.true / false
extractUniqueInlineImagesOnlyNotrueNoMultiple pages within a PDF file might refer to the same underlying image. If extractUniqueInlineImagesOnly is set to false, the parser will call the EmbeddedExtractor each time the image appears on a page. This might be desired for some use cases. However, to avoid duplication of extracted images, set this to true.true / false
enable-non-text-filterNofalseNoEnable to filter non text documents.true / false
enableFetchForNonTextNotrueNoEnable if the workflow needs to stream the non-text documents.true / false
non-text-documentNofalseNoEnable to filter using document extensions, disable to true / false
nonTextDocumentsExtensionsNo

jpg,jpeg,gif,png,tif,

mp3,mp4,mpg,mpeg,

avi,mkv,wav,bmp,swf,

war,rar,tgz,dll,exe,class

NoComma separated list of non-text document extensions. Used based on the non-text-document value"jpg,jpeg,gif,png"
nonTextDocumentsNo-NoPath to a file containing a list of regex that matches the non-text documents, one regex expression per line. Used based on the non-text-document value"config/nonTextDocuments.txt"
metadataMapYes

[ ]

YesSettings for mapping extracted fields to a destination field.
fromNo-NoField to be mapped"fieldA"
toNo-NoField where the value will be mapped"fieldB"
entityYes"user"NoEntity (user / group) represented by the static ACL."user" / "group"
accessYes"allow"NoAccess (allow / deny) granted by the ACL."allow" / "deny"

Example

$action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

  • No labels