In the "Connector" tab, specify the connection information to crawl the Gremlin.
- Endpoint: The URI from the Azure database account in the “Keys” tab (be sure to use the .NET SDK URI and not the Gremlin Endpoint)
- Key: Primary key for the database account.
- You can check Crawl all databases or specify databases/collections you want to crawl.
- Crawl Specific databases/collections:
- All collections per database: Need to write the database or databases to want crawl the containers inside.
- Specific database and collection: Is more accurate because you can decide the collection and the database, however the database and the collection need to be associated, in other words, the collection has to be inside the database.
- In "Specific Database and Collection" crawl type, added a checkbox option to enable a sql query field to get vertex specific features:
With this option in crawl type, is important not use a "Limit rows" and "Performance sampling" options, if need use these functions please add them to the query e.g. 'SELECT c._etag FROM c ORDER BY c._rid OFFSET 1 LIMIT 3'. In case of you consult specifics values in your query, you must insert a necessary field contained in each vertex named 'id'. So if the field to query is "firstName" your query should look like this e.g. 'SELECT table.id, table.firstName FROM table'.
IMPORTANT: If you don't use the id field the connector will give an error with an AspireException indicating that the id field is needed. However, you can use instead of the 'id' field, the symbol '*'. The query should look like this: SELECT * FROM tableName.
- Limit extracted rows: Enable fields to limit how many rows are selected from the table.
- Limit: Number of rows to be selected.
- Perform sampling: This option means that the retrieved records will be picked randomly. If check this option consider that the performance could be affected when querying large tables.