You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 2
Next »
The Box Connector will crawl content from a Box.com repository.
Introduction
Box connector will crawl content from a Box repository. The connector will retrieve the supported elements using the RESTful API (Content API Basics 2.0 version), for authentication will use Box API (that uses OAuth 2). Using JWT configuration.
Environment and Access Requirements
Repository Support
The OneDrive connector supports crawling the following the repositories:
Repository | Version | Connector Version |
---|
OneDrive | All | 5.1 |
User Account Requirements
To access OneDrive, a user account with sufficient privileges must be supplied.
Azure AD Application
To access OneDrive APIs (Microsoft Graph), you'll need to create an Azure AD Application.
Create the Azure AD application
- See Microsoft's Use portal to create an Azure Active Directory application and service principal that can access resources for the steps on how to create an Application ID properly and its key (Client Secret). Make sure to write down your Application Key at the time of creation. It will not be shown again after you exit the portal.
Assign permissions to the application
- Log into the Azure Management Portal.
- Click on the “Azure Active Directory” option.
- Select the “App registrations” option and then select your client application.
Select "API permissions" > "Add a permission" > "Microsoft APIs".
- Select the “Microsoft Graph” option.
- On Application Permissions, select the following:
- Read files in all site collections (Files.Read.All)
- Read and write files in all site collections (Files.ReadWrite.All)
- Read all users’ full profiles (User.Read.All)
- Read directory data (Directory.Read.All)
- Read all groups (Group.Read.All)
- Read and write items in all site collections (Sites.ReadWrite.All)
- Read items in all site collections (Sites.Read.All)
- Click on “Save”.
- Click on “Grant admin consent for . . . ”
- Click on “Yes” when prompted
Framework and Connector Features
Framework Features
Name | Supported |
---|
Content Crawling | Yes |
Identity Crawling | Yes |
Snapshot-based Incrementals | No |
Non-snapshot-based Incrementals | Yes |
Document Hierarchy | Yes |
Connector Features
Some of the features of the Box connector include:
- Ability to perform either full or incremental crawling (so that only new/updated documents are indexed), using stream_position.
- Possibility of exclude a folder or a set of folders and their content.
- Possibility of exclude or include elements (folders or files) by file name or folder name using regular expression (regex patterns)
- Metadata extraction
- it is search engine independent
- Runs from any machine with access to the given Box account
- Fetches access control lists (ACLs)
Content Crawled
The Box connector retrieves several types of objects:
Name | Type | Relevant Metadata | Content Fetch and Extraction | Description |
---|
|
|
|
|
|
Folder | container |
| Yes | The directories of the files. Each directory will be scanned to retrieve more subfolders or documents. Also the collaborators |
File | document |
| Yes | Files stored in folders/subfolders, will include tasks, comments, among other metadata fields. |
Box Note | document |
| Yes | Type of document |
Bookmark |
|
|
|
|
Google Doc |
|
|
|
|
Google spreadsheet |
|
|
|
|
Word doc |
|
|
|
|
Powerpoint doc |
|
|
|
|
Excel spreadsheet |
|
|
|
|
|
|
|
|
|
Include
- Folder’s collaborations
- Bookmark
- Google Doc
- Google Spreadsheet
- Word document
- PowerPoint document
- Excel Spreadsheet
- File’s comments
- File’s tasks
- Task’s assignments
- Users and Groups (memberships)
- Events (for Incremental crawls)
- ITEM_CREATE
- ITEM_UPLOAD
- COMMENT_CREATE
- ITEM_MOVE
- ITEM_COPY
- TASK_ASSIGNMENT_CREATE
- ITEM_TRASH
- ITEM_RENAME
- ITEM MODIFY
- TASK CREATE
- ITEM UNDELETED VIA TRASH
Exclude