You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 5
Next »
The Slide Extractor is a component that detects a PPTX PowerPoint file and parses/extracts the slides using Apache Tika:
- Extracting text content from PPTX presentations.
- Enable/disable slides splitting.
- Extract the slides content in separate jobs.
- Extract metadata such as slide title, author, created date, and modified date.
- Configure max characters file size for processing large PPTX files.
- Configure timeout for parsing process.
- Set the allocated memory for every tika process.
- Include/exclude embedded presentations as part of the content.
- Remove html tags in the content field.
- Clean content from embedded items, master layouts and relationships.