Vision API-Detect Handwriting (OCR) Python code implementation

In the previous article I have explained how to install Google Vision API. In this article we will explore one of the feature of Vision API i.e. detection of handwriting in the image.

According to Google doc Document Text Detection performs Optical Character Recognition. This feature detects dense document text – including handwriting – in an image.

Prerequisite:

For this article the code is executed in Jupyter notebook running on Google Cloud Platform VM Instance. To know how to create VM instance please refer this article.

OCR using Google Vision API

Open a new Jupyter notebook. First we will import the following packages

# import packages
import PIL
print("PIL version:", PIL.__version__)
import PIL.Image as Image
from google.cloud import vision
import io

Now after successful import following execute following lines of a function. This function will take input as path of input image containing the text and will return the block and text with the confidence values.

def detect_document(path):
    """Detects document features in an image."""
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)
    response = client.document_text_detection(image=image) #response is in JSON format
#print the response to the input image by vision API: block confidence
    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

At the time of calling the function we need to pass the image file path that we want to read. The response returned by vision API is in the JSON format. Vision will read individual character of words in of each block. At the time of reading it assigns a confidence values whenever it identifies block, text, language, individual characters, etc indicating how much vision is confident in identification or classification as a respective object. For example if confidence is .99 for a character ‘M’ that means vision is have 99% confidence in recognition of a character ‘M’ of a word in that block. Note that it also identifies the language of the text and assigns a confidence value to it.

{
  "cropHintsAnnotation": {...},
  "fullTextAnnotation": {
    "pages": [
      {
        "blocks": [
          {
            "blockType": "TEXT",
            "boundingBox": {
              "vertices": [...]
            },
            "confidence": 0.99,
            "paragraphs": [
              {
                "boundingBox": {
                  "vertices": [...,
                        "confidence": 0.99,
                        "property": {
                          "detectedLanguages": [
                            {
                              "languageCode": "en"
                            }
                          ]
                        },
                        "text": "c"
                      },
                        ............
}

Now will understand what exactly the block is? Let’s take a look at following image given as the input to the vision API and it will identify the blocks and text as follows:

For the above image identified following Blocks and texts are:

Block 1:
M a c h i n e L e a r n i n g M o d e l c r e a t i o n

Block 2:
D a t a

Block 3:
P r e p a r e d a t a f o r i n p u t i n t o M L m o d e l

Block 4:
T r a i n i n g , t e s t i n g , o p t i m i z i n g M L m o d e l

Block 5:
M L m o d e l o r m o d e l s

Block 6:
I m a g e c r e a t e d b y Y o g e s h G a d a d e f o r t h e e x p l a i n a t i o n p u r p o s e o n l y

Block 7:
U s i n g t r a i n e d m o d e l / m o d e l s

Block 8:
P r e d i c t o r C l a s s i f y

Block 9:
R e s u l t s

Block 10:
U n s e e n D a t a

Block 11:
P r e p a r e d a t a f o r i n p u t i n t o M L m o d e l

And the actual output printed by our python function for this input images is as follows:

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Machine (confidence: 0.9900000095367432)
	Symbol: M (confidence: 0.9900000095367432)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: c (confidence: 0.9900000095367432)
	Symbol: h (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: n (confidence: 1.0)
	Symbol: e (confidence: 1.0)
Word text: Learning (confidence: 0.9900000095367432)
	Symbol: L (confidence: 0.9900000095367432)
	Symbol: e (confidence: 1.0)
	Symbol: a (confidence: 1.0)
	Symbol: r (confidence: 1.0)
	Symbol: n (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: n (confidence: 1.0)
	Symbol: g (confidence: 1.0)
Word text: Model (confidence: 0.9900000095367432)
	Symbol: M (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: l (confidence: 1.0)
Word text: creation (confidence: 0.9900000095367432)
	Symbol: c (confidence: 0.9900000095367432)
	Symbol: r (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: i (confidence: 1.0)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: n (confidence: 1.0)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Data (confidence: 0.9900000095367432)
	Symbol: D (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)
	Symbol: t (confidence: 1.0)
	Symbol: a (confidence: 1.0)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Prepare (confidence: 0.9900000095367432)
	Symbol: P (confidence: 0.9900000095367432)
	Symbol: r (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: p (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)
	Symbol: r (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
Word text: data (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: t (confidence: 1.0)
	Symbol: a (confidence: 1.0)
Word text: for (confidence: 0.9900000095367432)
	Symbol: f (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
	Symbol: r (confidence: 1.0)
Word text: input (confidence: 0.9900000095367432)
	Symbol: i (confidence: 0.9900000095367432)
	Symbol: n (confidence: 1.0)
	Symbol: p (confidence: 0.9900000095367432)
	Symbol: u (confidence: 1.0)
	Symbol: t (confidence: 1.0)
Word text: into (confidence: 0.9900000095367432)
	Symbol: i (confidence: 0.9900000095367432)
	Symbol: n (confidence: 0.9900000095367432)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
Word text: ML (confidence: 0.9900000095367432)
	Symbol: M (confidence: 0.9900000095367432)
	Symbol: L (confidence: 0.9900000095367432)
Word text: model (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: d (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: l (confidence: 1.0)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Training (confidence: 0.9900000095367432)
	Symbol: T (confidence: 0.9900000095367432)
	Symbol: r (confidence: 0.9800000190734863)
	Symbol: a (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: n (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: n (confidence: 1.0)
	Symbol: g (confidence: 1.0)
Word text: , (confidence: 0.9900000095367432)
	Symbol: , (confidence: 0.9900000095367432)
Word text: testing (confidence: 0.9900000095367432)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: s (confidence: 0.9900000095367432)
	Symbol: t (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: n (confidence: 0.9900000095367432)
	Symbol: g (confidence: 1.0)
Word text: , (confidence: 1.0)
	Symbol: , (confidence: 1.0)
Word text: optimizing (confidence: 0.9800000190734863)
	Symbol: o (confidence: 0.9700000286102295)
	Symbol: p (confidence: 0.9800000190734863)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: i (confidence: 1.0)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: i (confidence: 0.9900000095367432)
	Symbol: z (confidence: 0.9900000095367432)
	Symbol: i (confidence: 0.9800000190734863)
	Symbol: n (confidence: 0.9900000095367432)
	Symbol: g (confidence: 1.0)
Word text: ML (confidence: 0.9900000095367432)
	Symbol: M (confidence: 0.9900000095367432)
	Symbol: L (confidence: 0.9900000095367432)
Word text: model (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: d (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: l (confidence: 1.0)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: ML (confidence: 0.9800000190734863)
	Symbol: M (confidence: 0.9900000095367432)
	Symbol: L (confidence: 0.9700000286102295)
Word text: model (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: e (confidence: 1.0)
	Symbol: l (confidence: 1.0)
Word text: or (confidence: 0.9700000286102295)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: r (confidence: 0.9599999785423279)
Word text: models (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: e (confidence: 1.0)
	Symbol: l (confidence: 1.0)
	Symbol: s (confidence: 0.9900000095367432)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Image (confidence: 0.9900000095367432)
	Symbol: I (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9800000190734863)
	Symbol: a (confidence: 1.0)
	Symbol: g (confidence: 1.0)
	Symbol: e (confidence: 1.0)
Word text: created (confidence: 0.9900000095367432)
	Symbol: c (confidence: 0.9900000095367432)
	Symbol: r (confidence: 0.9900000095367432)
	Symbol: e (confidence: 1.0)
	Symbol: a (confidence: 1.0)
	Symbol: t (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: d (confidence: 1.0)
Word text: by (confidence: 0.9900000095367432)
	Symbol: b (confidence: 0.9900000095367432)
	Symbol: y (confidence: 0.9900000095367432)
Word text: Yogesh (confidence: 0.9900000095367432)
	Symbol: Y (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
	Symbol: g (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: s (confidence: 1.0)
	Symbol: h (confidence: 1.0)
Word text: Gadade (confidence: 0.9900000095367432)
	Symbol: G (confidence: 0.9900000095367432)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
Word text: for (confidence: 0.9900000095367432)
	Symbol: f (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
	Symbol: r (confidence: 1.0)
Word text: the (confidence: 0.9900000095367432)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: h (confidence: 1.0)
	Symbol: e (confidence: 1.0)
Word text: explaination (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: x (confidence: 0.9900000095367432)
	Symbol: p (confidence: 1.0)
	Symbol: l (confidence: 0.9900000095367432)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: i (confidence: 1.0)
	Symbol: n (confidence: 1.0)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: t (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: o (confidence: 1.0)
	Symbol: n (confidence: 1.0)
Word text: purpose (confidence: 0.9900000095367432)
	Symbol: p (confidence: 0.9900000095367432)
	Symbol: u (confidence: 0.9900000095367432)
	Symbol: r (confidence: 1.0)
	Symbol: p (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
	Symbol: s (confidence: 1.0)
	Symbol: e (confidence: 1.0)
Word text: only (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
	Symbol: n (confidence: 0.9900000095367432)
	Symbol: l (confidence: 1.0)
	Symbol: y (confidence: 0.9900000095367432)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Using (confidence: 0.9900000095367432)
	Symbol: U (confidence: 0.9900000095367432)
	Symbol: s (confidence: 0.9900000095367432)
	Symbol: i (confidence: 0.9900000095367432)
	Symbol: n (confidence: 1.0)
	Symbol: g (confidence: 1.0)
Word text: trained (confidence: 0.9900000095367432)
	Symbol: t (confidence: 1.0)
	Symbol: r (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: n (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
Word text: model (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: l (confidence: 1.0)
Word text: / (confidence: 0.9900000095367432)
	Symbol: / (confidence: 0.9900000095367432)
Word text: models (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: d (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: l (confidence: 1.0)
	Symbol: s (confidence: 1.0)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Predict (confidence: 0.9900000095367432)
	Symbol: P (confidence: 0.9900000095367432)
	Symbol: r (confidence: 0.9900000095367432)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: i (confidence: 1.0)
	Symbol: c (confidence: 0.9900000095367432)
	Symbol: t (confidence: 1.0)
Word text: or (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: r (confidence: 0.9900000095367432)
Word text: Classify (confidence: 0.9900000095367432)
	Symbol: C (confidence: 0.9900000095367432)
	Symbol: l (confidence: 0.9900000095367432)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: s (confidence: 1.0)
	Symbol: s (confidence: 1.0)
	Symbol: i (confidence: 1.0)
	Symbol: f (confidence: 0.9900000095367432)
	Symbol: y (confidence: 0.9900000095367432)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Results (confidence: 0.9900000095367432)
	Symbol: R (confidence: 0.9900000095367432)
	Symbol: e (confidence: 1.0)
	Symbol: s (confidence: 1.0)
	Symbol: u (confidence: 1.0)
	Symbol: l (confidence: 1.0)
	Symbol: t (confidence: 1.0)
	Symbol: s (confidence: 1.0)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Unseen (confidence: 0.9900000095367432)
	Symbol: U (confidence: 0.9900000095367432)
	Symbol: n (confidence: 0.9900000095367432)
	Symbol: s (confidence: 0.9900000095367432)
	Symbol: e (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: n (confidence: 1.0)
Word text: Data (confidence: 0.9900000095367432)
	Symbol: D (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)

Block confidence: 0.9900000095367432

Paragraph confidence: 0.9900000095367432
Word text: Prepare (confidence: 0.9900000095367432)
	Symbol: P (confidence: 0.9900000095367432)
	Symbol: r (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: p (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)
	Symbol: r (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
Word text: data (confidence: 0.9900000095367432)
	Symbol: d (confidence: 0.9900000095367432)
	Symbol: a (confidence: 0.9900000095367432)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: a (confidence: 1.0)
Word text: for (confidence: 0.9900000095367432)
	Symbol: f (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: r (confidence: 1.0)
Word text: input (confidence: 0.9900000095367432)
	Symbol: i (confidence: 0.9900000095367432)
	Symbol: n (confidence: 1.0)
	Symbol: p (confidence: 0.9900000095367432)
	Symbol: u (confidence: 1.0)
	Symbol: t (confidence: 1.0)
Word text: into (confidence: 0.9900000095367432)
	Symbol: i (confidence: 0.9900000095367432)
	Symbol: n (confidence: 0.9900000095367432)
	Symbol: t (confidence: 0.9900000095367432)
	Symbol: o (confidence: 1.0)
Word text: ML (confidence: 0.9399999976158142)
	Symbol: M (confidence: 0.9900000095367432)
	Symbol: L (confidence: 0.8999999761581421)
Word text: model (confidence: 0.9900000095367432)
	Symbol: m (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: d (confidence: 1.0)
	Symbol: e (confidence: 0.9900000095367432)
	Symbol: l (confidence: 1.0)

For the code please refer this GitHub Repository. I have uploaded the code file and input images containing both English and Hindi text.

For more details on Google Vision API please refer this link. That’s it for this article. If you have any questions feel free to ask in the comment section below also please like and subscribe to my blog 🙂

Leave a comment

Design a site like this with WordPress.com
Get started
search previous next tag category expand menu location phone mail time cart zoom edit close