Maximize usability and minimize file size of PDFs using Acrobat’s Object Character Recognition

Problems with scanned documents

  • Unable to perform a word search
  • Unable to copy text
  • Unable to edit text
  • Too large to email
  • Too large to archive

Acrobat’s Object Character Recognition (OCR) can fix these problems!

OCR 1As an example, I’ve scanned in an 129-page proposal at 300dpi on a Sharp MX6240, which resulted in a 119.3 MB PDF.

OCR 2The first step is to adjust your OCR settings. Using Acrobat Pro DC’s Tools search, type “ocr”. Select Enhanced Scans/Recognize Text.

OCR 3Within the Enhanced Scans/Recognize Text tool bar, select Settings. The primary settings to adjust is Output. Three options exist. Taking my 129-page sample scanned proposal, I ran OCR using all three options:

  • Searchable Image: This option deskews scanned pages for a plumb and true document; recognizes text, and places an invisible text layer on top; recognizes images; and discards whitespace.
    • 49 percent reduction in size to 60.7 MB
    • 7:08 minutes to complete
  • Searchable Image (Exact): This option skips the deskew step, as well as the discarding of whitespace.
    • No reduction in size
    • 4:08 minutes to complete
  • Editable Text and Images: This option, which was called “ClearScan” In previous versions of Acrobat, offers the best results. It synthesizes a new Type 3 font that is close to the original, places it on an invisible text layer, then downsamples the original background
    • 73 percent reduction in size to 31.8 MB
    • 8:12 minutes to complete (7:00 minutes to scan and 1 minute to process)

Additional Options

  • Downsample To: When using the Searchable Image option, this sets the downsample rule for content that Acrobat detects as an image. In my opinion, most of the file size reduction when Searchable Image is due to discarding whitespace from the scanned image. Using Downsample, you can continue to reduce the file size. Consider leaving this feature at its default of 600 dpi and use the Editable Text and Images (ClearScan) option.

Acrobat Pro DC has integrated OCR into its Edit PDF tool, which makes it an on-demand feature when you are only interested in a portion of the document.

OCR was completed on a 2.5GHz 6-Core Intel XEON E5 Mac Pro; your time may vary.