Co-occurrence or cooccurrence is a linguistics term that can either mean concurrence / coincidence or, in a more specific sense, the above-chance frequent occurrence of two terms from a text corpus alongside each other in a certain order. Co-occurrence in this linguistic sense can be interpreted as an indicator of semantic proximity or an idiomatic expression. In contrast to collocation, co-occurrence assumes interdependency of the two terms. A co-occurrence restriction is identified when linguistic elements never occur together. Wikipedia
The co-occurrence or collocation of words to form short phrases (2-4 words) can be useful in tagging content and performing query enhancement by adding a level of meaning to these phrases and therefore improved relevancy for result sets.
The components described in this section take advantage of Wikipedia as a source for phrases and DBpedia and Wikilinks to add semantic meaning to those phrases. The basic architecture used is as follows:
After Aspire HDFS feed:
Generate the Master Dictionary
Once the Master Dictionary is complete: