Skip to content

Refactored PDF & ZIP export features for images. Add reCAPTCHA or Shib...

Sean Aery requested to merge DDR-2252-pdf-zip-export into main

Refactored PDF & ZIP export features for images. Add reCAPTCHA or Shib verification to thwart bots. Use derived_image JPGs from filesystem as source instead of image server. Replace rubyzip with zip_tricks for streaming. Replace prawn with hexapdf CLI for better RAM usage. Closes DDR-2252.


This is a complete refactor of the PDF & ZIP export features for DDR image items in order to accommodate exporting 100+ page items. Summary of changes:

For either ZIP or PDF

  • use the new derived_image JPG derivatives stored on the filesystem as the source files instead of hitting the image server w/http requests
  • removes deprecated code for exports relying on the image server (e.g., absolute_urls_for_img_component_jpgs)
  • adds reCAPTCHA or Shib verification to prevent bots from triggering exports
  • adds an animated loading state to the link that indicates to a user when the export is processing
  • moves logic out of the CatalogController and into a new ExportFilesController
  • updates ddr-core to 1.6.5 in order to use SolrDocument.derived_image_file_path
  • adds SolrDocument.derived_image_file_paths -- an array of component JPG paths for an item

ZIP exports

  • replaces RubyZIP gem with ZipTricks to enable streaming the ZIP file as it builds

PDF exports

  • replaces Prawn with HexaPDF gem to create PDFs; uses HexaPDF's command-line utility image2pdf
  • this HexaPDF approach consumes only 1/3 of the system memory as Prawn did during a PDF export
  • now writes the PDF to a Tempfile (and removes it afterward)
  • no longer downscales the PDF to 1000px (this was reducing the page size but not the file size); PDF pages are now the same pixel dimensions as the source image
  • removes fastimage gem
  • note that the prawn gem has not been removed since it is presently still used for exporting AV caption files (WebVTT) as PDF

Requirements

  • Update ddr-admin to at least 1.13.4 to capture SolrDocument.derived_image_file_path during indexing.
  • Ensure image components have derived_image JPG files; run rake ddr:derived_images to retroactively create them.
  • Reindex image components

Add these two environment variables; see https://duke.app.box.com/notes/830416801166 for more info:

RECAPTCHA_SITE_KEY
RECAPTCHA_SECRET_KEY

Screenshots

Options render in the Download menu:

Screen_Shot_2021-07-14_at_9.47.54_AM

Prompts for verification:

Screen_Shot_2021-07-14_at_9.50.46_AM

reCAPTCHA -- sometimes a check is sufficient:

Screen_Shot_2021-07-14_at_10.11.55_AM

reCAPTCHA -- sometimes have to select images e.g.:

Screen_Shot_2021-07-14_at_9.51.48_AM

Passed reCAPTCHA:

Screen_Shot_2021-07-14_at_9.54.33_AM

Failed reCAPTCHA:

Screen_Shot_2021-07-14_at_10.46.50_AM

Export in Progress:

Screen_Shot_2021-07-14_at_9.48.57_AM

Edited by Sean Aery

Merge request reports