Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

This stage identifies tokens that looks like URL addresses and flag it them as "URL".


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer

Info

Currently handles the following situations:

  • HTTP and HTTPS protocols
  • Domains
  • IP address (with protocol) e.g. http://235.156.13.10/
  • Ports
  • Paths
  • Parameters and anchors, query strings.
  • Encoding

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

...

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

...

Saga_config_stage
boundaryFlagstext block split
requiredFlagstokenskipFlagsskip

Example Output

Saga_graph
V----------------------------[All the answers in http://www.notaproblem.com.]-----------------------------V 
^--------------------------[All the answers in http://www.notaproblem.com]---------------------------V-[]-^ 
^-[All]-V-[the]-V-[answers]-V-[in]-V------------------[http://www.notaproblem.com]-------------------^      
^-[all]-^                          ^-[http]-V-[:]-V-[//]-V-[www]-V-[.]-V-[notaproblem]-V-[.]-V-[com]-^      
                                   ^----------------------------[{URL}]------------------------------^      

...

  • SEMANTIC_TAG - Identifies all lexical items which are semantic tags.
  • URL - Identifies the token as an URL address

Vertex Flags:

Info

No vertices are created in this stage