Table of Contents |
---|
Launch Aspire (if it's not already running). See:
To add a Publish to Microsoft Search drag a Microsoft Search publisher rule from the Workflow Library and drop to the Workflow Tree where you want to add it. This will automatically open the Microsoft Search publisher window for the configuration of the publisher.
In the publisher window, specify the connection information to publish to the Microsoft Search.
Microsoft Search allows the usage of one of two schemas: fixed ExternalFile or custom ExternalItem. ExternalItem allows limited freedom to define properties to be expected from crawled items. Needless to say, the Groovy transformation file must yield an output that matches the expected schema.
The fixed external file schema expects the following information:
This is a sample document in JSON format as expected by the REST API:
Code Block | ||||
---|---|---|---|---|
| ||||
{ "@odata.type": "microsoft.graph.externalFile", "acl": [ { "type": "user", "value": "d411eb08-42e2-4316-aab5-2df8e9d9c21b", "accessType": "grant", "identitySource": "Azure Active Directory" } ], "createdDateTime": "2017-11-08T19:06:17Z", "modifiedDateTime": "2017-11-08T19:06:17Z", "createdBy": "empty", "lastModifiedBy": "empty", "title": "sample document", "url": "http://the.url.com", "name": "name.txt", "extension": "txt", "size": 10, "content": "the content/n" } |
As mentioned before, the ExternalItem schema has limited customization capabilities. It expects the following information:
When configured to use custom schema, the publisher component expects a txt file with the following structure:
Code Block | ||||
---|---|---|---|---|
| ||||
{ "properties": [ { "name": "propertyName", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" } ] } |
Where:
This is a the default schema file provided with the component (schemaProperties.json):
Code Block | ||||
---|---|---|---|---|
| ||||
{ "properties": [ { "name": "id", "type": "String" }, { "name": "name", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" }, { "name": "extension", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" }, { "name": "size", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" }, { "name": "createdBy", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" }, { "name": "lastModifiedBy", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" }, { "name": "createdDateTime", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" }, { "name": "modifiedDateTime", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" }, { "name": "title", "type": "String", "isSearchable": "true", "isRetrievable": "true" }, { "name": "url", "type": "String", "isSearchable": "true", "isRetrievable": "true", "isQueryable": "true" } ] } |
The Groovy transformation file makes it easy to output data that is customized to the client's needs and also that can be safely conveyed to Microsoft Search through the REST API. The output of the transformation must match the expected schema structure.
This is the default Groovy transformation file that is provided with the component (aspireToMicrosoftSearchBulk.groovy):
Code Block | ||||
---|---|---|---|---|
| ||||
import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.nio.charset.Charset;
def connectorSpecificMap = [
'isContainer':'is_container'
]
def getContent(String content) {
try {
if(content.getBytes().length > 16777216L) {
return content.substring(0,10485760) + "...";
} else {
return content
}
} catch(Throwable t) {
return "";
}
}
def getMD5(String id) {
MessageDigest digest = MessageDigest.getInstance("MD5")
String md5name = new BigInteger(1, digest.digest(id.getBytes())).toString(16)
return md5name;
}
// Function that process the children of a connector specific field
def getChildren(name, parent) {
builder."$name"() {
parent.getChildren().each() { val ->
def attr = val.getName();
//if it has other children
if(val.getChildren().size() > 0) {
getChildren(attr, val);
} else {
builder."$attr"() {
//All the attributes
val.getAttributeNames().each() { attrName ->
"@$attrName" val.getAttribute(attrName);
}
//Main content
if (val?.getText() != null) {
'_$' val?.getText();
}
}
}
}
}
}
//***************************************************
//
// Main routine
//
// Action of the job
String action = doc.action.getText();
if ((action == "add") || (action == "update")) {
/*****************
* Add or Update *
*****************/
builder.$object() {
'@search.action' "upload"
// Get ID
String newId = "";
if (doc.id != null) {
newId = getMD5(doc.id.getText())
} else if (doc.fetchUrl != null) {
newId = getMD5(doc.fetchUrl.getText())
} else if (doc.url != null) {
newId = getMD5(doc.url.getText())
} else if (doc.displayUrl != null) {
newId = getMD5(doc.displayUrl.getText())
} else {
newId = "ID-NOT-PROVIDED"
}
'id' newId
// name
String nameOfTheFile = doc.url.getText()
if(nameOfTheFile != null) {
String[] urlItems = nameOfTheFile.split('/')
nameOfTheFile = urlItems[urlItems.length - 1]
name nameOfTheFile
String[] fileNameItems = nameOfTheFile.split(/\./)
if(fileNameItems.length > 1) {
extension fileNameItems[fileNameItems.length - 1]
} else {
extension '[empty]'
}
}
// Size
if(doc.size != null) {
size doc.size
}
// createdBy
if(doc.author != null) {
createdBy doc.author
} else {
createdBy "empty"
}
// lastModifiedBy
if(doc.lastModifiedBy != null) {
lastModifiedBy doc.author
} else {
lastModifiedBy "empty"
}
//createdDateTime
if(doc.createDate != null) {
createdDateTime doc.createdDateTime
} else if(doc.lastModified != null) {
createdDateTime doc.lastModified
} else {
createdDateTime (new Date())
}
//modifiedDateTime
if(doc.lastModified != null) {
modifiedDateTime doc.lastModified
} else {
modifiedDateTime (new Date())
}
// title
if(doc.title != null) {
title doc.title
} else if(nameOfTheFile != null) {
title nameOfTheFile
} else {
title '[empty title]'
}
if (doc.displayUrl != null) {
url doc.displayUrl
} else if (doc.url != null) {
url doc.url
} else if (doc.fetchUrl != null){
url doc.fetchUrl
} else {
url "URL-NOT-PROVIDED"
}
// content
if(doc.content != null) {
content doc.content?.getText()
}
// ACLs
if (doc.acls != null) {
builder.acls() {
$list {
doc.acls.getChildren().each() { val ->
$object() {
name val.getAttribute("name")
access val.getAttribute("access")
entity val.getAttribute("entity")
}
}
}
}
//END
}
}
} else {
/**********
* Delete *
**********/
builder.$object() {
'@search.action' "delete"
String delId = "";
// Get ID
if (doc.id != null) {
delId = getMD5(doc.id.getText())
} else if (doc.fetchUrl != null) {
delId = getMD5(doc.fetchUrl.getText())
} else if (doc.url != null) {
delId = getMD5(doc.url.getText())
} else if (doc.displayUrl != null) {
delId = getMD5(doc.displayUrl.getText())
} else {
delId = "ID-NOT-PROVIDED"
}
'id' delId
}
}
|
Once you've clicked on the Add button, it will take a moment for Aspire to download all of the necessary components (the Jar files) from the Maven repository and load them into Aspire. Once that's done, the publisher will appear in the Workflow Tree.
Info |
---|
For details on using the Workflow section, please refer to Workflow introduction. |