Optical Character Recognition (OCR) - Getting Started: Adobe Acrobat

Basic overview of several tools (both open source such as Tesseract and commercial such as Adobe Acrobat) that perform optical character recognition (OCR).

Demo

Tool Overview

Adobe Acrobat is a commercial tool available for download on KU workstations campus-wide. KU IT provides information on capabilities, access, and support for Adobe Acrobat across campus. KU students can also purchase Acrobat at a discounted rate for use on personal computers. Although it is no longer available for download commercially, Adobe Acrobat XI Pro is the version of Acrobat supported at KU. There are several version of Acrobat with varying OCR capabilities, which are outlined here.

Using Acrobat Pro

1. The OCR option is a little hidden under the 'Tools' menu in the setting for 'Enhance Scans.' It's easy to select a single file or batch process multiple files. 

image of home screen for Adobe Acrobat Pro with tool upload image selected

2. ​Following the prompts, select one or more files (or one or more folders).

  • Acrobat can handle PDFs or image files (tested with JPEG and TIFF) 
  • Note that if a PDF already has OCR applied, Acrobat will not do anything with the files. 

​3. Settings for output files (location, renaming, edit original files, etc.) 

  • Output is always PDF with the OCR text embedded 

Output Options for Adobe Acrobat Pro

4. Additional settings for what Acrobat will do to the resultant file: 

Screenshot of Adobe Acrobat Pro text recognition options

  • 'Searchable Image' will apply OCR and resample the actual image
  • ‘Searchable Image (Exact)’ will apply OCR as an invisible layer over the top of the untouched image 
  • ‘Editable Text and Images’ creates a new image that will let you overwrite the actual scanside by side comparison of a page from Esquire magazine (1941) showing how the text has been edited, but the image remains intact 

 5. To review the text, you have a couple options: 

  • These appear to be single file only. I did not find a batch option… 
    • Open the output PDF in Acrobat and select ‘Tools’ then ‘Export PDF’ 

Screenshot of Adobe Acrobat file output options

  • With the file open under ‘Enhance Scans’, select ‘Recognize Text’ then ‘Correct Recognize Text. Acrobat will show sections of text that are recognized as text but unclear as to what it is. This text can be edited in place. 

example of scanned text allowing word-by-word correction

  • The ‘Review recognized text’ option will show the overlaid text 

image of "suspect" text allowing editing