Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For example, when iterating over the senses of the word 'casas' (Spanish for houses) we will get this from a the JWKTL entry: "{{f.s.p|casa}}"

...

Therefore, the dump parser tool will create a plural type relationship between the word 'casas' and its root 'casa'. This relationship entry is something SAGA understands and will use to reduce the word 'casas' to 'casa' in the Lemmatize stage.


4.1 Steps to add your

...

new language

1. Get the code from Git: https://source.digital.accenture.com/projects/ST/repos/saga-wiktionary-dump-parser/browse

2. Add a new folder and its corresponding SenseParser and RelationNormalizer files for your language. For As an example, for Spanish we have:

...

Info

Review existing files for both English nd and Spanish to get an idea of how to implement these 2 files for your new language. Implementation will depend a lot of the specific templates used in your language Wiktionary.

 

3. Add page parser instantiation for your language in the method 'GetPageParser' in the class: '\src\main\java\com\searchtechnologies\wiktionary\WiktionaryParser.java':

...

4.2 Using the Dump Parser Tool

The tools tool is a command line tool. If you run it without any parameter you'll get help information.

Basically you need to run the tool 3 times:

1. First Do the first run with the -parse option in order to parse the Wiktionary dump file and create an index:

...

Example:   -parse file=c:/temp/wiktionary.xml output=c:/temp/index


2.  Second  Do a second run with the -mongo option to read the index and create entries in a MongoDB. (make sure you have a proper MongoDB server instance running)

...

-mongo lang=spa indexDir=c:/temp/index host=localhost port=27017 db=dictionary collection=wiktionary


3. Third run Do a third and last run with the -dict option to read MongoDB collection and produce a JSON file SAGA will eventually use

...

 -dict lang=spa indexDir=c:/temp/index outputDir=c:/temp/saga host=localhost port=27017 db=dictionary collection=wiktionary


Step 5: Add Wiktionary file to Saga Library

...

First, rename the file created using the ISO 3 letter language code to: "wiktionary-XXX[3 letter language code here]". So if your new language is German then it should be 'wiktionary-DEU'

...