Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Java Wiktionary Library (JWKTL) is an open source project aimed to ease the parsing of Wiktionary data. Check out their site here: https://dkpro.github.io/dkpro-jwktl/

Currently JWKTL only supports English, German and Russian languages if you download the source from their siteby default.

At Accenture, we added Spanish support and you can get the source code from Git: https://source.digital.accenture.com/projects/ST/repos/saga-jwktl/browse

For Spanish, we focused on the bare minimum SAGA needs to work. If you want to do the same then , it may be a good idea to base your new language on the Spanish parser (copy, paste and rename files). If you want to implement a more complete version of the parser then the English parser is a better option.

Following image shows the structure of the JWKTL project. Notice there is a folder for each language. You'll need to add a new folder for your desired language.


Image Added


3.1 How JWKTL works?

  1. It will detect the language of the dump file and use the correct parser for the language detected.
  2. For each entry in the dump file:
    1. For each line in the entry
      1. It will iterate over the list of handlers looking for a handler that can handle the current one. So for example, if the current line has the pattern of a new section and the section is Etymology, then the EtymologyHandler will process and extract the information from the line.
      2. A section is usually conformed by several lines,  There is code in place to know if the next line belongs to the same section and needs to be handled by the same handler or if a new section was found and a new handler needs to be found. 


Info

Handlers are registered in the WiktionaryEntryParser for each language.  The registration order is important, for example SenseHandler needs to be the last one. The recommendation is to follow the same order defined by the English parser for the handlers you are implementing.

3.2 Additional needed changes

In addition to adding a new folder and handlers for your new language you need to add the following changes:

  1. Add your new language as a new static field in this class: src/main/java/de/tudarmstadt/ukp/jwktl/api/util/Language.java  

Image Added

2. Add your new language parser instantiation in the onSiteInfoComplete method in the class: src/main/java/de/tudarmstadt/ukp/jwktl/parser/WiktionaryArticleParser.java

Image Added then based on another language, add the supporting classes accordingly.Image Removed






 

...