The Slide Extractor component detects a PPTX PowerPoint file and parses/extracts the slides using Apache Tika.
- Extracting text content from PPTX presentations.
- Enable/disable slides splitting.
- Extract the slides content in separate jobs.
- Extract metadata such as slide title, author, created date, and modified date.
- Configure max characters file size for processing large PPTX files.
- Configure timeout for parsing process.
- Set the allocated memory for every Tika process.
- Include/exclude embedded presentations as part of the content.
- Remove HTML tags in the content field.
- Clean content from embedded items, master layouts and relationships.