In the "Connector" tab, specify the connection information to crawl the Box.
- In the "Server" field in the ' Content Source Properties' section, enter the Server URL to crawl.
- In the "Server API" field in the ' Content Source Properties' section, enter the URL to the Box API (For version 2.0, the current url is https://api.box.com).
Most include the respective protocol, http or https.
- API Version, the connector was develop and tested with version 2.0.
- Enter the Client ID, Client_Secret , Redirect_Url : The Box API uses OAuth 2 with JWT for authentication, so in order to connect, we need register a Box application and get these values .
- Enter the Username, Password of the Box.Com account (Admin account).
Note: The password will be automatically encrypted by Aspire. - Enter the page size, which indicates the amount of documents or folders that will be returned by the API each call.
- Exclude extension for extract text by setting the extensions (separeted by comma) you don't want to extract the text, for instance dll or exe.
Check on the other options as needed:
Impersonate users?: Impersonate each user in order to crawl all shared and private content.
If unchecked, only shared content accessible by the crawling account will be crawled. - Index folders?: index subfolders as items. If unchecked, only files will be indexed.
- Scan subfolders?: Scan through subfolder's child nodes.
- Exclude Sub Folders: Enter a Folder name (or a list of folders) to be exclude from the crawling.
Include/Exclude patterns: Enter regex patterns to include or exclude files/folders based on URL matches.
If you want to specify include patterns, click on the 'add new' button for include patterns and specify the regex pattern. So Aspire will only crawl URLs with the specified pattern.If you want to specify exclude patterns, click on the 'add new' button for exclude patterns and specify the regex pattern. So Aspire will exclude crawling of URLs that matches the specified pattern.Click on Advance Configuration option, verify the Box Working directory value, you will need to create Box.accesstoken and Box.refreshtoken files with the corresponding values from the previous process (Box Prerequisites) and save them into the working dir.
In the "Server" field in the ' Content Source Properties' section, enter the Server URL to crawl.In the "Server API" field in the ' Content Source Properties' section, enter the URL to the Box API (For version 2.0, the current url is https://api.box.com).- Most include the respective protocol, http or https.
API Version, the connector was develop and tested with version 2.0.Enter the Client ID, Client_Secret, Redirect_Url: The Box API uses OAuth 2 for authentication, so in order to connect, we need register a Box application and get these values.Enter the Username, Password of the Box.Com account (Admin account).- Note: The password will be automatically encrypted by Aspire.
- the config.json file with the private key information. If you don't use the config.json you will need to use the public key id, encrypted private key (please remove the header "-----BEGIN ENCRYPTED PRIVATE KEY-----" and footer "-----END ENCRYPTED PRIVATE KEY-----", use only the encrypted value), password for the private key, and the enterprise id.
- Enter the Username of the Box.com account (Admin account).
- Enter the page size, which indicates the amount of documents or folders that will be returned by the API each call
.Exclude extension for extract text by setting the extensions (separeted by comma) you don't want to extract the text, for instance dll or exe- .
Check on the other options as needed:- Impersonate users?:
Impersonate - If true, the connector will impersonate each user in order to crawl all shared and private content.
- Also you can add the user login account for a subset of users that you want to impersonate.
If unchecked, only shared content accessible by the crawling account will be crawled. - Backoff on Error: If true, the connector will have a back off mechanism when the server returns the specified error.
- Backoff error pattern: Indicate the regex to match the error message to backoff.
- Backoff in minutes: Time to wait when a backoff error is encountered.
- Backoff max retries: Number of retries with backoff when error is encountered.
9.Index folders?: index subfolders as items. If unchecked, only files will be indexed.
10. Scan subfolders?: Scan through subfolder's child nodes.
11. Exclude Sub Folders: Enter a Folder name (or a list of folders) to be exclude from the crawling.
Include/Exclude patterns: Enter regex patterns to include or exclude files/folders based on URL matches.
- If you want to specify include patterns, click on the 'add new' button for include patterns and specify the regex pattern. So Aspire will only crawl URLs with the specified pattern.
- If you want to specify exclude patterns, click on the 'add new' button for exclude patterns and specify the regex pattern. So Aspire will exclude crawling of URLs that matches the specified pattern.
Click on Advance Configuration option, verify the Box Working directory value, you will need to create Box.accesstoken and Box.refreshtoken files with the corresponding values from the previous process (Box Prerequisites) and save them into the working dir.