Link Search Menu Expand Document

This tutorial is adapted from the Caltech Handprint repository, by Cody Carvel, Brown University.
If you have any questions or suggestions, please email me

RECOGNIZING AND EXTRACTING HANDWRITTEN TEXT USING PYTHON AND HANDPRINT

This tutorial will help you use two of the most impressive computer vision libraries available: Microsoft’s Azure and/or Google’s Cloud Platform (GCP) for Handwritten Text Recognition (HTR)–to recognize and extract handwritten text from image or PDF files. (Note: The Handprint package can also make use of Amazon’s Rekognition and Textract services; we focus on Azure and GCP because their performance has, anecdotally, proven more accurate)

This tutorial requires the use of Terminal in macOS, or PowerShell in Windows Linux-based systems will likely work using most of the same commands.

We will be using a Python Package called Handprint, developed by the Caltech Library to perform text recognition and extraction on image files or PDF documents that contain handwriting.

The overview of goals for this tutorial are:
If you are using a Mac, start here:
⓵. Set up the Homebrew package manager to add Python 3 and PIP 3
to your command line (Terminal) using pyenv
;

If you are using Windows, start here:
⓵ⓐ. Set up Chocolatey package manager to add Python 3 and PIP 3
to your command line (PowerShell) using pyenv-win
;

⓶ⓐ. Set up Microsoft Azure Cloud for text recognition and extraction;

⓶ⓑ. Set up Google Cloud Platform for text recognition and extraction;

⓷. Install and Run Handprint, a Python Package for Handwritten Text Recognition


Content is modified from material originally written by Michael Hucka for Handprint and made available under the terms of a CC BY-SA 4.0 license.