You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »



Introduction


The Slide Extractor is a component that detects a PPTX PowerPoint file and parses/extracts the slides using Apache Tika:

Features

  • Extracting text content from PPTX slides.
  • Extracting metadata such as slide title, author, created date, and modified date.
  • Configurable max characters file size for processing large PPTX files.
  • Configurable timeout for parsing process
  • No labels