Generative AI client has been designed to allow running custom Groovy scripts. The scripts can do many different tasks but they are most useful when Generative AI services are needed to be called for crawled documents and the results from those calls are to be put in the document for later indexing. When writing Groovy scripts we can use so called binding variables to simplify AI related tasks:
Easy Heading Free | ||||||
---|---|---|---|---|---|---|
|
Name in script | Description | Aspire type | Init script | Process script |
---|---|---|---|---|
doc | Crawled document | AspireObject | false | true |
component | Aspire workflow component running Groovy scripts | ComponentImpl | true | true |
connection.client | REST client component for making AI calls | GenAIRestRequester | true | true |
utilities.azure.embeddings utilities.google.embeddings | Methods related to "embeddings" processing | Embeddings | true | true |
job | Job containing the crawled document | Job | false | true |
secrets | Map of secrets provided in UI | Map<String,String> | true | true |
template | Map of selected script template variables | Map<String,String> | true | true |
utilities.azure.prompts utilities.google.prompts | Methods related to "prompts" processing | Prompts | true | true |
utilities.textSplitter | Method related to text splitting | TextSplitterComponent | true | true |
variables | Map of variables provided in initialize script | Map<String,Object> | true | false |
utilities | Various helper methods | Utils | true | true |
The crawled document can be used for accessing metadata and the content and also for storing a new metadata acquired from AI:
Code Block | ||
---|---|---|
| ||
doc.add(embeddings.toAspireObject()); |
The component can typically be used as a logger:
Code Block | ||
---|---|---|
| ||
component.info(" %s","${doc.id}: Got embeddings for sentence: ${currentSentence}") |
REST client is available via connection object and can be used for making requests to AI services
connection.client |
---|
REST client is automatically configured using UI DXF configuration when initialized. When authentication method "NONE" is selected (default option) the authentication must happen in initialization script. In our examples we typically use adding "apiKey" header field
Code Block | ||
---|---|---|
| ||
connection.client.addHeader("apiKey", "${secrets.apiKey}"); |
Number of methods can be used and all are listed in Javadoc of com.accenture.aspire.genaiclient.scriptsupport.rest.GenAIRestRequester. Here are selected methods most probably used in AI related scripts:
Method | Syntax | Init script | Process script |
---|---|---|---|
execute POST | HttpResponse<?> executePost(String url, AspireObject httpBody) | false | true |
HttpResponse: most likely this will be AspireObjectResponse (it depend on UI configuration field "responseFactory"). This can be converted using utilities methods like for example utilities.azure.embeddings.convertResponse to get desired output (see Embeddings and Prompts documentation on this page) url: AI service URL httpBody: The body of the POST. It can be also created using utilities method like for example utilities.azure.embeddings.createPostBody to make it easier when creating Embeddings and Prompts related requests (See Embeddings and Prompts documentation on this page) | |||
execute GET | HttpResponse<?> executeGet(String url) | false | true |
HttpResponse: most likely this will be AspireObjectResponse (it depend on UI configuration field "responseFactory") url: AI service URL | |||
Add header | addHeader(String name, String value); | true | false |
name: header name name: header value |
Code Block | ||
---|---|---|
| ||
... // url endpointEmbeddings = "${template.endpoint}/openai/deployments/${template.model}/embeddings?api-version=${template.apiVersion}" .... def getEmbeddingsFromSentence(endpointEmbeddings, sentence) { response = connection.client.executePost(endpointEmbeddings, utilities.azure.embeddings.createPostBody(sentence)); def embeddings = utilities.azure.embeddings.convertResponse(sentence, response) return embeddings } |
429 policy and related throttling can be configured using UI DXF field Policy429. If a value is selected the connection will be throttled automatically:
You can also handle throttling manually in Groovy script. For example here is how to use Seed blocking:
Code Block | ||
---|---|---|
| ||
def resp = connection.client.executeGet("url")); if(resp.getStatusCode() == 429){ long pauseSeedUntil = System.currentTimeMillis() + (Integer.valueOf(resp.getHeaders().get("Retry-After")) * 1000); throw new com.accenture.aspire.services.ThrottlingNotificationException(pauseSeedUntil); } |
utilities.azure.embeddings, utilities.google.embeddings aiService = azure|google(Palm) |
---|
Method | Syntax | Init script | Process script | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Initialize Azure. Use it if you want to use the below mentioned "process" method | void utilities.azure.embeddings.initialize(AspireObject config) | true | false | |||||||||||||||||||||||||
config:
| ||||||||||||||||||||||||||||
Initialize Google Palm. Do it if you want to use the below mentioned "process" method | void utilities.google.embeddings.initialize(AspireObject config) | true | false | |||||||||||||||||||||||||
config:
| ||||||||||||||||||||||||||||
Process. It creates embeddings for each text chunk provided in the list. It must be initialized first via "initialize" | VectorEmbeddingResult utilities.aiService.embeddings.process(List<String> splitText) VectorEmbeddingResult: see the format below. All vectors are present. splitText: text chunks for creating embeddings | false | true | |||||||||||||||||||||||||
Convert response. It converts the response from AI embeddings call. The response format can be slightly different for each AI provider. It can be converted to AspireObject and stored in the document. | VectorEmbeddingsResult utilities.aiService.embeddings.convertResponse(String text, AspireObjectResponse response) response) VectorEmbeddingsResult:
response: Http response to convert | |||||||||||||||||||||||||||
Create POST body. It creates POST body for calling AI embeddings service | AspireObject utilities.aiService.embeddings.createPostBody(String text) text: text to converted to the POST body | |||||||||||||||||||||||||||
Create Sub document. It can be used when each embeddings chunk is to be posted as a separate sub job | AspireObject utilities.azure.embeddings.createSubDoc(VectorEmbeddingsResult vectorEmbeddingsResult, AspireObject doc, int chunkCount) vectorEmbeddingResult: previously created embedding object doc: the current document chunkCount: the current text chunk number (see the example below) |
Example of initialization script when we want to use complex embedding "process " method in the process script:
Code Block | ||
---|---|---|
| ||
import com.accenture.aspire.services.AspireObject; utilities.textSplitter.initialize(getTextSplitterConfig("sentence")) utilities.azure.embeddings.initialize(getEmbeddingsConfig()) def getEmbeddingsConfig() { AspireObject returnValue = new AspireObject("config"); returnValue.add("endpoint", "${template.endpoint}"); returnValue.add("model", "${template.model}"); returnValue.add("apiVersion", "${template.apiVersion}"); returnValue.add("apiKey", "${secrets.apiKey}"); return returnValue; } def getTextSplitterConfig(String splitType) { ..... } |
Example of process script using complex "process" method:
Code Block | ||
---|---|---|
| ||
def sentences = utilities.textSplitter.process(doc); embeddings = utilities.azure.embeddings.process(sentences); doc.add(embeddings.toAspireObject()); |
Example of process script publishing sub jobs for each embedding chunk:
Code Block | ||
---|---|---|
| ||
import com.accenture.aspire.services.AspireException // split field "content" and create "sentences" def sentences = utilities.textSplitter.process(doc); // url endpointEmbeddings = "${template.endpoint}/openai/deployments/${template.model}/embeddings?api-version=${template.apiVersion}" // generate and publish embeddings sentences.eachWithIndex {currentSentence, sentencesCount -> embeddingVector = getEmbeddingsFromSentence(endpointEmbeddings, currentSentence) subJobAO = utilities.azure.embeddings.createSubDoc(embeddingVector, doc, sentencesCount); utilities.createSubJob(job, subJobAO) } def getEmbeddingsFromSentence(endpointEmbeddings, sentence) { response = connection.client.executePost(endpointEmbeddings, utilities.azure.embeddings.createPostBody(sentence)); def embeddings = utilities.azure.embeddings.convertResponse(sentence, response) return embeddings } |
Job can be used when required as a parameter for other methods:
Code Block | ||
---|---|---|
| ||
utilities.createSubJob(job, subJobAO) |
Secrets defined in UI which are stored as encrypted can be accessed in scripts. They are automatically decrypted before using them.
Code Block | ||
---|---|---|
| ||
client.addHeader("api-key", "${secrets.apiKey}"); |
If in UI a template script with properties has been selected we can access those properties in the script:
Code Block | ||
---|---|---|
| ||
def getEmbeddingsConfig() { AspireObject returnValue = new AspireObject("config"); returnValue.add("endpoint", "${template.endpoint}"); .... } |
Code Block | ||
---|---|---|
| ||
// url endpointEmbeddings = "${template.endpoint}/openai/deployments/${template.model}/embeddings?api-version=${template.apiVersion}" |
// TODO
utilities.azure.prompts, utilities.google.prompts aiService = azure|google(Palm) |
---|
Method | Syntax | Init script | Process script | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Initialize Azure. Use it if you want to use the below mentioned "process" method | void utilities.azure.prompts.initialize(AspireObject config) | true | false | |||||||||||||||||||||||||
config:
| ||||||||||||||||||||||||||||
Initialize Google Palm. Do it if you want to use the below mentioned "process" method | void utilities.google.prompts.initialize(AspireObject config) | true | false | |||||||||||||||||||||||||
config:
| ||||||||||||||||||||||||||||
Process. It creates embeddings for each text chunk provided in the list. It must be initialized first via "initialize" | PromptsResult utilities.aiService.prompts.process(AspireObject doc) PromptsResult: see the format below. All vectors are present. doc: text chunks for creating embeddings | false | true | |||||||||||||||||||||||||
Convert response. It converts the response from AI embeddings call. The response format can be slightly different for each AI provider. It can be converted to AspireObject and stored in the document. | PromptsResult utilities.aiService.prompts.convertResponse(AspireObjectResponse response) response) PromptsResult:
response: Http response to convert | |||||||||||||||||||||||||||
Create POST body. It creates POST body for calling AI embeddings service | AspireObject utilities.aiService.prompts.createPostBody(Map<String, String> map) text: text to converted to the POST body |
Example of initialization script when we want to use complex embedding "process " method in the process script:
Code Block | ||
---|---|---|
| ||
import com.accenture.aspire.services.AspireObject; utilities.azure.prompts.initialize(getPromptsConfig()) private AspireObject getPromptsConfig() { AspireObject configAO = new AspireObject("config"); configAO.add("endpoint", "${template.endpoint}"); configAO.add("model", "${template.model}"); configAO.add("apiVersion", "${template.apiVersion}"); configAO.add("apiKey", "${secrets.apiKey}"); configAO.add("temperature", "${template.temperature}"); configAO.add(getPromptsList()); return configAO; } private AspireObject getPromptsList(){ AspireObject prompts = new AspireObject("prompts"); def returnValue = new ArrayList(); AspireObject prompt = AspireObject.createFromJSON("prompt", "{\"prompt\":{\"promptType\":\"user\",\"useGroovy\":false,\"promptText\":\"Describe the paws of a polar bear named \\\"Thunder\\\"\"}}", false); returnValue.add(prompt); prompt = AspireObject.createFromJSON("prompt", "{\"prompt\":{\"promptType\":\"system\",\"useGroovy\":true,\"promptText\":\"return \\\"Describe it like you are \\\"+doc.getText(\\\"character\\\")+\\\"\\\";\"}}", false); returnValue.add(prompt); prompt = AspireObject.createFromJSON("prompt", "{\"prompt\":{\"promptType\":\"system\",\"useGroovy\":true,\"promptText\":\"return \\\"Describe it like it lived in the planet \\\"+doc.getText(\\\"planet\\\");\"}}", false); returnValue.add(prompt); prompts.add(returnValue); return prompts; } |
Example of process script using complex "process" method:
Code Block | ||
---|---|---|
| ||
def sentences = utilities.textSplitter.process(doc); embeddingspromptsResponse = utilities.azure.embeddingsprompts.process(sentencesdoc); doc.add(embeddingspromptsResponse.toAspireObject()); |
Example of process script publishing sub jobs for each embedding chunk:
Code Block | ||
---|---|---|
| ||
import com.accenture.aspire.services.AspireObject; // url endpointPrompts = "${template.endpoint}/openai/deployments/${template.model}/chat/completions?api-version=${template.apiVersion}" ... for (String paragraph : small_piece_list) { answer = generateSummaryOfSummaries(endpointPrompts, paragraph.take(MODEL_MAX_MESSAGE_SIZE - MODEL_MAX_TOKENS)) if (doc.get("summarizationError")?.getContent()) { component.info("%s", "summarization error detected on document: ${doc.get('summarizationError')}") searchFields.add("summarizationError", doc.getContent("summarizationError")) searchFields.add("generatedSummary", "") return } else { summaries.add(answer["summary"]) keyphrases.addAll(answer["keyphrases"]) } } def generateSummaryOfSummaries(endpointSummary, article) { def requestRetries = 0 body = [ "messages" : [[ "role" : "system", "content": "You are a system that, given a text, extract a summary from it, and also a list of important keywords from it, based on the user input"], [ "role" : "user", "content": "CONTENT={${article}}\ 1.Clean [CONTENT] by removing formatting, special characters, and non-alphanumeric symbols.\ 2. Read through the entire document to grasp its main points and arguments.\ 3. Identify the key topics and supporting details presented in the document.\ 4. Create an outline for the summary, noting the main sections or topics covered.\ 5. Summarize each main section or topic in a clear and concise manner, using your own words, focusing on presenting the most significant and relevant information while leaving out unnecessary details.Aim for a summary length of 3-5 lines or a paragraph, depending on the document's size and complexity.\ 6. Review the summary for accuracy and coherence with the original document, checking that the summary conveys the main points and ideas accurately.Respond as follows:\ SUMMARY:summary\ 7. Provide the final list of max 100 important keyphrases without considering the frequency. Include all the abbreviations in the list, and do not repeat any keyphrases. Respond as follows:\ KEYPHRASES:comma separated list of keyphrases" ] ], "temperature" : "${template.temperature}", "max_tokens" : 1500, "top_p" : 1.0, "frequency_penalty": 0.0, "presence_penalty" : 0.0 ]; response = connection.client.executePost(endpointSummary, utilities.azure.prompts.createPostBody(body)); def request_result = getSummaryFromResponse(response) if (!request_result["isError"]) { return request_result } else { responseHeaders = response.getHeaders() } doc.add("summarizationError", "Errors on request for summary and keyphrases.") return ["isError": true, "summary": "", "keyphrases": []] } def getSummaryFromResponse(response) { def isError = true def summary = "" def keyphrases = [] if (response.getStatusCode() == 200) { def content = response.getContent(); def choices = content.get("choices"); if (choices != null) { def finish_reason = choices.getText("finish_reason"); if (finish_reason == "content_filter") { component.info("%s", "${doc.id}: Unable to generate summary. Document has content that was blocked by Azure content filter. Setting summarization error") doc.add("summarizationError", "Unable to generate summary due to Content Filtering Policy") summary = ""; keyphrases = [] isError = false } else { def message = choices.get("message"); component.info("%s", "${doc.id}: message content: ${message}") def content_openai = message != null ? message.getText("content") : ""; //the response from the AI has now 2 parts, one SUMMARY, and one KEYPHRASES. Parsing the message to get them and store in the value to return. def finder = (content_openai =~ /(SUMMARY|KEYPHRASES):\\s*(.+)/) finder.each { match -> if (match.size() == 3) { if (match[1] == "SUMMARY") { summary = match[2] component.info("%s", "${doc.id}: summary piece: ${summary}") } if (match[1] == "KEYPHRASES") { keyphrases = match[2].split(",") component.info("%s", "${doc.id}: keyphrases piece: ${keyphrases}") } } } isError = false } } } else { .... } return ["isError": isError, "summary": summary, "keyphrases": keyphrases] } |
// TODO
Text splitter | utilities.textSplitter |
---|
Method | Syntax | Init script | Process script | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Initialize | void utilities.textSplitter.initialize(AspireObject config) | true | false | |||||||||||||||
config:
| ||||||||||||||||||
Process | List<String> utilities.textSplitter.process(AspireObject doc) List<String>: TODO doc: TODO | false | true |
Example of initialization script:
Code Block | ||
---|---|---|
| ||
import com.accenture.aspire.services.AspireObject; utilities.textSplitter.initialize(getTextSplitterConfig("sentence")) def getTextSplitterConfig(String splitType) { AspireObject returnValue = new AspireObject("config"); returnValue.add("splitType", splitType); returnValue.add("fieldsToSplit", "content"); returnValue.add("customSplitRegex", "\\|+"); returnValue.add("characterThreshold", 4); return returnValue; } |
Example script:
Code Block | ||
---|---|---|
| ||
def sentences = utilities.textSplitter.process(doc); |
// TODO
// TODO