Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.



Easy Heading Free
navigationTitleOn this Page
wrapNavigationTexttrue
navigationExpandOptionexpand-all-by-default

Introduction


The language detector is a component that detects the language from the field configured within the Aspire object. This returns the results in the configured fields.

Features

  • Indicates in which language certain text data has been written.
  • Tries to detect very short words and phrases.
  • Multiple languages are supported.
  • It can be built from all supported languages
  • Can exclude or include certain languages from the decision process.

...

Slide Extractor component detects a PPTX PowerPoint file and parses/extracts the slides using Apache Tika.

Features

  • Extracts text content from PPTX presentations.
  • Allows to Enable/disable slides splitting.
  • Extracts the slides content in separate jobs.
  • Extracts metadata such as slide title, author, created date, and modified date.
  • Configurable max characters file size for processing large PPTX files.
  • Configurable timeout for parsing process.
  • Sets the allocated memory for every Tika process.
  • Allows to Include/exclude embedded presentations as part of the content.
  • Removes HTML tags in the content field.
  • Cleans content from embedded items, master layouts and relationships.