The overall structure of an Saga program is shown in the diagram below:
Every token in a piece of text could have multiple interpretations. An "interpretation graph" is an efficient method for showing all possible interpretations of a piece of text.
As an example, the interpretation graph of "Abe Lincoln likes the iPhone-8" might look like this:
In this example, we see that:
What's not shown in the above diagram are confidence factors, which are tagged on every interpretation.
It is this "node and edge" structure which makes this an interpretation graph.
Information can only be added to an interpretation graph. It can never be removed or changed. By this we mean:
This comes from hard experience where we have discovered that, ultimately, "all interpretations are possible". When we have implemented these toolkits previously, we have had to make hard choices. For example, what punctuation splits a token, is upper-case important, do we need to save the original variation or is the root word enough. In almost all cases the answer is "sometimes" or, occasionally, "almost always".
And so, we never actually remove any interpretations from the graph. Instead, all interpretations are kept at all times and disambiguation is used to choose which interpretation the application will be most likely to be correct.