The Box Connector will crawl content from a Box.com repository.

Introduction

Box connector will crawl content from a Box repository. The connector will retrieve the supported elements using the RESTful API (Content API Basics 2.0 version), for authentication will use Box API (that uses OAuth 2). Using JWT configuration.

Environment and Access Requirements

Repository Support

The Box connector supports crawling the following the repositories:

Repository	Version	Connector Version
Box.com	All	5.3

User Account Requirements

To access Box repository, a user account with sufficient privileges must be supplied.

Box Application

To access Box APIs, you'll need to create an Box Application (Client Id and Client Secret)

Create the Azure AD application

See Microsoft's Use portal to create an Azure Active Directory application and service principal that can access resources for the steps on how to create an Application ID properly and its key (Client Secret). Make sure to write down your Application Key at the time of creation. It will not be shown again after you exit the portal.

Assign permissions to the application

Log into the Azure Management Portal.
Click on the “Azure Active Directory” option.
Select the “App registrations” option and then select your client application.
Select "API permissions" > "Add a permission" > "Microsoft APIs".
Select the “Microsoft Graph” option.
On Application Permissions, select the following:
- Read files in all site collections (Files.Read.All)
- Read and write files in all site collections (Files.ReadWrite.All)
- Read all users’ full profiles (User.Read.All)
- Read directory data (Directory.Read.All)
- Read all groups (Group.Read.All)
- Read and write items in all site collections (Sites.ReadWrite.All)
- Read items in all site collections (Sites.Read.All)
Click on “Save”.
Click on “Grant admin consent for . . . ”
- Click on “Yes” when prompted

Authentication

The connector supports two types of authentication using a Box Application's Client Id and Client Secret (Client Credential) or using Json Web Token (JWT)

Client Credential Grant

This server-side aithentication does not require end-user interaction and, if granted the proper privileges, can be used to act on behalf of any user in an enterprise, this is important to get all the content extracted. Identity is validated using the application's client ID and client secret.

JWT

Server-side authentication using JSON Web Tokens (JWT) does not require end-user interaction and, if granted the proper privileges, can be used to act on behalf of any user in an enterprise, this is important to get all the content extracted. Identity is validated using a JWT assertion and public/private keypair.

Framework and Connector Features

Framework Features

Name	Supported
Content Crawling	Yes
Identity Crawling	Yes
Snapshot-based Incrementals	No
Non-snapshot-based Incrementals	Yes
Document Hierarchy	Yes

Connector Features

Some of the features of the Box connector include:

Ability to perform either full or incremental crawling (so that only new/updated documents are indexed), using stream_position.
Possibility of exclude a folder or a set of folders and their content.
Possibility of exclude or include elements (folders or files) by file name or folder name using regular expression (regex patterns)
Metadata extraction
it is search engine independent
Runs from any machine with access to the given Box account
Fetches access control lists (ACLs)

Content Crawled

The Box connector retrieves several types of objects:

Name	Type	Content Fetch and Extraction	Description

Folder	container	Yes	The directories of the files. Each directory will be scanned to retrieve more subfolders or documents. Also the collaborators are include as ACLs (Access Control Lists).
File	document	Yes	Files stored in folders/subfolders, content, tasks and comments are part of metadata fields that are extracted.
Box Note	document	Yes	Type of document
Bookmark	document	Yes	Web Links
Google Doc	document	Yes	Type of document
Google spreadsheet	document	Yes	Type of document
Word doc	document	Yes	Type of document
Powerpoint doc	document	Yes	Type of document
Excel spreadsheet	document	Yes	Type of document

Incremental Crawl

For incremental crawls, the connector will use the stream_position value, so only the changes reported from that time value to the current moment will be crawl.

These are the list of events supported:

ITEM_CREATE
ITEM_UPLOAD
COMMENT_CREATE
ITEM_MOVE
ITEM_COPY
TASK_ASSIGNMENT_CREATE
ITEM_TRASH
ITEM_RENAME
ITEM MODIFY
TASK CREATE
ITEM UNDELETED VIA TRASH

Identity Crawl

Folder colaborators, group memberships, and users are part of the ACLs and Identity Fetcher values which are part of the security of the content.

Page tree

Introduction

Environment and Access Requirements

Repository Support

User Account Requirements

Box Application

Create the Azure AD application

Assign permissions to the application

Authentication

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Incremental Crawl

Identity Crawl

Contact Us: [email protected]

Page tree

Box.com Connector - Features

Introduction

Environment and Access Requirements

Repository Support

User Account Requirements

Box Application

Create the Azure AD application

Assign permissions to the application

Authentication

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Incremental Crawl

Identity Crawl

Contact Us: [email protected]