Step 3: Add a new JIVE Content Source
Add new source
To specify exactly what JIVE community to crawl,create a new "Content Source".
To create a new content source:
- From the Aspire Home page, click on "Add Source" button.
- Click on "Jive Connector".
General Configuration Tab
In the "General" tab in the Add New Content Source window, specify the following basic information:
- Enter a descriptive content source name in the "Name" field.
This is any useful name which you decide. It will be displayed in the content source page, in error messages, etc.
- Click on the "Active?" checkbox to add a checkmark.
Unchecking the "Active?" option allows you to configure content sources but not have them enabled. This is useful if the folder will be under maintenance and no crawls are wanted during that period of time.
- Click on the "Schedule" drop-down list and select one of the following: Manually, Periodically, Daily, or Weekly.
Aspire can automatically schedule content sources to be crawled on a set schedule, such as once a day, several times a week, or periodically (every N minutes or hours). For the purposes of this tutorial, you may want to select Manually and then set up a regular crawling schedule later.
- After selecting a Schedule type, specify the details, if applicable:
- Manually: No additional options.
- Periodically: Specify the "Run every:" options by entering the number of "hours" and "minutes."
- Daily: Specify the "Start time:" by clicking on the hours and minutes drop-down lists and selecting options.
- Weekly: Specify the "Start time:" by clicking on the hours and minutes drop-down lists and selecting options, then clicking on the day checkboxes to specify days of the week to run the crawl.
- Advance: Enter a custom CRON Expression (e.g. 0 0 0 ? * *)
Connector Configuration Tab
In the "Connector" tab, specify the connection information to crawl the JIVE folder.
- Enter the community url you want to crawl.
- Enter the username of the user with Full Control.
- Enter the password of the user.
Check on the other options as needed:
- Page Size: Indicates the maximum number of elements to retrieve per call. The maximum number is 100 and minimum is 25 due API limitations.
- Use Progressive Retries?: Check if you want to manage connection retries and timeouts.
- Min Wait: This is the minimum amount of time the system will wait before retrying a failed crawl. This time is set in seconds.
- Increment: This is the amount of time added each time a crawl fails. You can specify this time in seconds, minutes or as a multiplier.
- Max Wait: This is the maximum amount of time allowed by the system to make a retry. Once the wait time for a retry has exceeded this value the operation will be aborted. This time is set in minutes.
- Connection Timeout: Time in seconds to wait before connection gives timeout.
- Connection Retries: Number of re-connection retries per connection, to attempt if the connection fails.
- MapDB's Directory: Directory path where the mapDb files will be stored.
- Use Creation Date Filter for Crawls: Check to use date filtering to improve performance for full crawls (Jive 8 or greater).
- Fetch Document Level Security?: Check if you want to fetch the document level security. These are the security properties you set when creating a content in jive.
- Security ACLs for places?: Check if you want to fetch the access control information (ACLs) for the places (spaces, blogs and groups).
- Fetch Security ACLs for places with security plugin : This option is used for on-premise installations of Jive and the Jive Security Mapper Plug-in must be installed .
- Use Entitlement API for place ACLs (Jive cloud) : This feature is only used for Jive cloud-based instances.
- Index Specific Endpoints?: Check if you want to specify which endpoint you want to crawl. If this option is not selected the connector will crawl all four endpoint types.
- People: Check if you want to crawl all the people information.
- Places: Check if you want to crawl all the places (spaces, groups, blogs and projects.
- Contents: Check if you want to crawl all the standard or custom contents(documents, files, posts, polls, updates, ideas, ...) .
- Announcements: Check if you want to crawl all the announcements.
- Incremental crawling type: Allows you to select the type of incremental crawling you want to perform. The selection of settings will be dependent on the type of Jive instance being crawled, on-premise or cloud-based.
- For on-premise instances select:
- Normal Incremental: Select this option to use normal snapshot file based incremental crawling.
- Activity Incremental: Check if you want to do a low impact incremental that only crawls the major changes register in the Jive community. NOTE: This crawl doesn't replace the normal incremental, Activity incremental doesn't detect deletes.
- Activity Count: Number of activity crawls performed before a normal incremental is execute.
- Timestamp Directory: Directory path where the timestamp will be stored.
- Set Manual Timestamp: Check this option if you want to overwrite the system timestamp with a custom timestamp.
- Timestamp: Manual timestamp that must be in the following format "2014-01-01T00:00:00.000-0000" (yyyy-MM-dd'T'HH:mm:ss.SSSZ).
- For cloud-based instances:
- Analytic API Incremental: Select this option that uses the Analytic/Data Export Service (DES) API.
- Creating add-on for Analytic service: In your jive instance you should create add-on for analytic services to use analytic API.
- To create an add-on, click on your profile icon in the top left hand corner and click on 'Add-Ons'.
- Then from the add-on page click on 'Analytics Services' on the menu item in the left.
- Give necessary information and create add-on.
- Click on the action icon on created add-on and click on 'view client ID and secret'.
- Copy the client id and secret values. You will need those values later.
- Jive API URL: API URL of your Jive instance.
- API Version: The version of API. This feature is developed and tested using API version v2. i.e if you add v2 as the version your API requests will be in the format https:<jive_api_url>/analytics/v2/export/activity. For more information about DES API v2 please refer https://community.jivesoftware.com/docs/DOC-99916
- Client Id: This is the client id that you received in the previous step when creating the add-on.
- Client Secret: This is the client secret that you received in the previous step when creating the add-on.
- Authorization key validity period: There is a validity period for authorization code of Analytic API. If our authorization code is not valid connector should regenerate it with the given credentials.
- Note: There is an option in Jive to register an on-premise Jive instance to upload activity data to a cloud-based instance that supports DES. If this option in Jive is enabled the Analytic API Incremental option described below can be used. For more information on registering on-premise on DES cloud please refer https://community.jivesoftware.com/docs/DOC-99916
- Custom Metadata Options: Check this option if you want to deselect metadata calls to the API, to improve performance.
- Fetch ModifiedBy for Documents and Files?: Unchecked this option if you don't want to make an extra call to the API to fetch the last person who modified the documents and files.
- Fetch Owner for Tasks?: Unchecked this option if you don't want to make an extra call to the API to fetch the owner of a task.
- Fetch Discussion's Replies?: Unchecked this option if you don't want to make extra calls to the API to add the replies of a discussion.
- Fetch Comments?: Unchecked this option if you don't want to make extra calls to the API to add the comments of a content.
- Include/Exclude patterns: Enter regex patterns to include or exclude files/folders based on URL matches.
Step 3b.1: Group Expansion (Optional)
In 'Advanced Connector Properties' in 'Group Expansion' you can setup the Group Expansion to expand Security Groups and Social Groups (needs the Security Plugin for on-premise instances).
- Select the schedule type and start data
- Enter the community url you want to crawl.
- Enter the username of the user with Full Control.
- Enter the password of the user.
- Expand Security Groups?: Check if you want to expand the Security Groups from the JIVE Community. You need to the Jive Security Mapper Plug-in installed in your community for this.
Workflow Configuration Tab
In the "Workflow" tab, specify the workflow steps for the jobs that come out of the crawl. Drag and drop rules to determine which steps should an item follow after being crawled. This rules could be where to publish the document or transformations needed on the data before sending it to a search engine. See Workflow for more information.
- For the purpose of this tutorial, drag and drop the Publish To File rule found under the Publishers tab to the onPublish Workflow tree.
- Specify a Name and Description for the Publisher.
- Click Add.
After completing this steps click on the Save button and you'll be sent back to the Home Page.