Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The OneDrive Box Connector will crawl content from a Microsoft One Drive Box.com repository.


Easy Heading Free
navigationTitleOn this Page
wrapNavigationTexttrue
navigationExpandOptionexpand-all-by-default

Introduction


OneDrive is Microsoft's cloud service which links you to all your files. It allows you to store and protect your files, share them with other people, and access them from anywhere on all your devices. For more information about OneDrive, see the official Microsoft Overview of OneDrive in Microsoft 365 documentationBox connector will crawl content from a Box repository. The connector will retrieve the supported elements using the RESTful API (Content API Basics 2.0 version), for authentication will use Box API (that uses OAuth 2). Using JWT configuration.

Environment and Access Requirements


Repository Support

The OneDrive Box connector supports crawling the following the repositories:

RepositoryVersionConnector Version
OneDriveBox.comAll5.13

User Account Requirements

To access OneDriveBox repository, a user account with sufficient privileges must be supplied.

Azure AD

Box Application

To access OneDrive Box APIs (Microsoft Graph), you'll need to create an Azure AD Application. 

Create the Azure AD application

  1. See Microsoft's Use portal to create an Azure Active Directory application and service principal that can access resources for the steps on how to create an Application ID properly and its key (Client Secret). Make sure to write down your Application Key at the time of creation. It will not be shown again after you exit the portal. 

Assign permissions to the application

  • Log into the Azure Management Portal.
  • Click on the “Azure Active Directory” option.
  • Select the “App registrations” option and then select your client application.
  • Select "API permissions" > "Add a permission" > "Microsoft APIs".

  • Select the “Microsoft Graph” option.
  • On Application Permissions, select the following:
    • Read files in all site collections (Files.Read.All)
    • Read and write files in all site collections (Files.ReadWrite.All)
    • Read all users’ full profiles (User.Read.All)
    • Read directory data (Directory.Read.All)
    • Read all groups (Group.Read.All)
    • Read and write items in all site collections (Sites.ReadWrite.All)
    • Read items in all site collections (Sites.Read.All)
  • Click on “Save”.
  • Box Application (Client Id and Client Secret)


    Authentication 

    The connector supports two types of authentication using a Box Application's Client Id and Client Secret (Client Credential) or using Json Web Token (JWT).

    Client Credential Grant 

    This server-side aithentication does not require end-user interaction and, if granted the proper privileges, can be used to act on behalf of any user in an enterprise, this is important to get all the content extracted. Identity is validated using the application's client ID and client secret.

    See Box page,  https://developer.box.com/guides/authentication/client-credentials/client-credentials-setup/ for the steps on how configure Custom App,  with the following configurations:

    Please, select 'Server Authentication (Client Credentials Grant)' as authentication method. 

    In Application Access, choose App Acccess + Enterprise Access.

    For Application Scopes, select 'Read all files and folders stored in Box', 'Manage user', 'Manage groups', 'Manage enterprise properties'.


    JWT

    Server-side authentication using JSON Web Tokens (JWT) does not require end-user interaction and, if granted the proper privileges, can be used to act on behalf of any user in an enterprise, this is important to get all the content extracted. Identity is validated using a JWT assertion and public/private keypair.

    Follow this page for JWT configuration steps https://developer.box.com/guides/authentication/jwt/jwt-setup/ in order to create a custom app.

    For application authetication select Server Authentication (With JWT).

    Generate a keypair. Make sure to write down your Application Key at the time of creation. It will not be shown again after you exit the portal. 

    Add here any requirements related to where and how the connector will be running: machine location in relation to the repository

    Click on “Grant admin consent for . . . ”Click on “Yes” when prompted



    Framework and Connector Features


    Framework Features

    Name Supported
    Content CrawlingYes
    Identity CrawlingYes
    Snapshot-based IncrementalsYesNo
    Non-snapshot-based IncrementalsNoYes
    Document HierarchyYes

    Connector Features

    The OneDrive connector has the following features:

    Some of the features of the Box connector include:

    • Ability to perform either full or Performs incremental crawling (so that only new/updated documents are indexed), using OneDrive Delta Tokenstream_position.
    • Possibility to include or exclude an item of exclude a folder or a set of items folders and their content.
    • Possibility of exclude or include elements (folders or files) by file name or folder name using regular expression (regex patterns)
    • Metadata extraction
    • Is it is search engine independent
    • Runs from any machine with access to the given OneDrive repositoryBox account
    • Fetches access control lists (ACLs)


    Content Crawled

    The OneDrive connector can crawl the following Box connector retrieves several types of objects:

    Contains Folders and files
    NameTypeRelevant MetadataContent Fetch and ExtractionDescriptionDrivecontainerN/A





    Foldercontainer
    N/AYesThe directories of the files. Each directory will be scanned to retrieve more subfolders or documents. Also the collaborators are include as ACLs (Access Control Lists).
    Filedocument
    YesFiles stored in folders/subfolders

    ...

    , content, tasks and comments are part of metadata fields that are extracted.
    Box Notedocument
    YesType of document
    Bookmarkdocument
    YesWeb Links
    Google Docdocument
    YesType of document
    Google spreadsheetdocument
    YesType of document
    Word docdocument
    YesType of document
    Powerpoint docdocument
    YesType of document
    Excel spreadsheetdocument
    YesType of document





    Incremental Crawl

    For incremental crawls, the connector will use the event stream position value, so only the changes reported from that time value to the current moment will be crawl. 

    These are the list of events supported:

    • ITEM_CREATE
    • ITEM_UPLOAD
    • COMMENT_CREATE
    • ITEM_MOVE
    • ITEM_COPY
    • TASK_ASSIGNMENT_CREATE
    • ITEM_TRASH
    • ITEM_RENAME
    • ITEM MODIFY
    • TASK CREATE
    • ITEM UNDELETED VIA TRASH

    Identity Crawl

    Folder colaborators, group memberships, and users are part of the ACLs and Identity Fetcher values which are part of the security of the content.