Crawl Zip File Process

While the crawl is running, when a zip file is found, the scanner get the metadata of the file and process it in order to get the info of every entry in the zip file. 

Connectors

For the time being, the Zip File process is only available for these 3 connectors:

  • File System
  • CIFS
  • Lotus

File types

The process is able to extract and process these file types:

  • ZIP
  • AR
  • ARJ
  • CPIO
  • JAR
  • DUMP
  • TAR

Known limitations

  • RAR is a proprietary algorithm and was not included for this version.
  • 7z does not support stream opening, so it was excluded from this version.
  • If the ZIP files are excluded from the crawl, the Scan Excluded Items option will not work.

Configuration parameters

  • extractFolders (same as the scanner)
  • scanRecursive (same as the scanner)

We plan to implement this process for all connectors: Move the processing of zip files into a separate component and place it in the pipeline after the scanner. This will be implemented in a future Aspire release.

  • No labels