Random Programmer

How to Apply Image Processing Techniques to Extract Data From ID Cards

The complete guide, how to detect objects from the image and make Tesseract OCR read and parse data from ID cards.
6 min read
Segmented image of ID card
Segmented image of ID card

Machine Learning algorithms have come a long way within recent years and one of the most popular data sources for today’s algorithms are images. However, the way that you process images before feeding them to a machine learning algorithm can significantly improve the accuracy of the solution you’re trying to build. In this article, we’ll guide you through how to detect and prepare images for the purpose of extracting valuable information from ID cards and certificates before handing the image over to AI for future parsing. To demonstrate, we’ll be using image processing to extract information from an Armenian national vehicle registration certificate.

The Task at Hand: Extracting Data From a Vehicle Certificate

In our example, we’ll be retrieving information from an Armenian national vehicle registration certificate, however, the procedure we use can be followed by anyone who is looking to extract data from any type of ID card or certificate (ie health insurance card, birth certificate, driver’s license, passport, etc.).

The certificate at hand contains information about a vehicle owner, like name and address, and our goal is to have our system accurately extract this information. Our solution should parse the image into data strings and correctly identify which field each data string belongs to.

Embedded Asset

{}

Challenges Faced with Image Processing

As you can see from the image above, the card contains text in two languages, Armenian and English, and we would like to parse both of them. There are very few public OCRs that support the Armenian language. This is something to be aware of if you’re trying to process text in multiple languages within images.

The vehicle registration card will be placed in the center of the camera’s image field, but even then, it will have some borders and will likely not be aligned horizontally. The certificate card that we’re working with also presents some challenging physical characteristics; it’s made from plastic, has many watermarks on it and is very glittery. As a result, unless you set up favorable lighting before taking the picture, some parts of the text will likely be brighter or more difficult to read than others depending on the lighting within the user’s environment.

Card Detection

First of all, we need to establish a card box within the image in order to identify that the image is in the correct position. To make our task easier, we’ll ask that the user position the card at the center of the camera’s field of view and get as close to the card as they can without cutting any of it off.

Embedded Asset

{}

To be able to correctly determine fields within the image you’re processing, you’ll need to locate card coordinates very precisely. To do this, you’ll need to find at least two points on the certificate/ID which have a consistent and known position from card to card. Those two points can be any part of the card, but they must elements that are distinct enough so as to not be confused with other elements that exist on the card. We can then use the template matching technique for finding unique parts of the card displayed within the image. In our example using the Armenian vehicle registration certificate, some of the unique parts are the “AM” and “ARMENIA” texts on the card.

Embedded Asset

{}

Embedded Asset

{}

They are far enough so that if their position was misplaced by a couple of pixels, it would not affect position recognition. These elements are also not part of the card’s watermark, so they’ll look the same in any lightning unlike the coat of arms sign where the center changes based on the given light.

OpenCV has built in a matchTemplate function, which we can then use to find templates in the image. This function supports multiple methods of template matching, but our tests show that “CCOEFF_NORMED” works the best for our specific case.

Embedded Asset

{}

Now we can use the template’s position to determine card position. Under the hood is the coordinate system change formula with some modifications to support scale change based on the template’s distance. Take a look at the code we used.

Image Processing

There are several adjustments that we need to make to our image before we forward it to an Optical Character Recognition tool to extract the required data.

Image Segmentation

Now that we know the card’s position, it’s easy to identify the position of text fields within the certificate and separate them. For that, we’ll use the same technique that we used to determine the card coordinates.

Embedded Asset

{}

Field Angle Correction

Next, we must process the text within the image to correct the field angle. Because OCRs work more accurately if the text is horizontally corrected, we should convert rhombus to a rectangle. Below is what our text fields look like after field angle correction.

Embedded Asset

{}

Noise Reduction

At this point, we’ve already cropped the desired text fields to parse. For that, we can use one of the libraries that support the required languages Armenian and English. Now we have to reduce the noise of the image. Here's what we’re left with if we don’t adjust noise.

Embedded Asset

{}

First, let’s use OpenCV’s fastNlMeansDenoisingColored function to reduce the noise.

Embedded Asset

{}

Image Thresholding

Next, we will apply the OpenCV adaptive threshold in order for us to get a clean black and white image that is easily legible.

Embedded Asset

{}

Optical Character Recognition (OCR)

After processing our image, we can then feed it to an OCR to extract the text information that we’re interested in. In our example, we used the Tesseract OCR, one of the most accurate open source tools, to extract the text from the vehicle registration certificate. Based on OCR metadata that comes for each parsed word, we can also see text parts that are too small for being correctly parsed and by removing them, we can reduce error rate even further.

The Final Result

Conclusion

Setting up card recognition and deploying image processing techniques are two crucial steps to retrieving accurate data from ID cards and other certifications using an OCR solution. Take a look at the demo video and the code base on GitHub I’ve released to help guide you in creating a solution similar to the one we’ve.

The techniques described in the Image Processing section can be avoided if you are using Google Vision API. It’s much more accurate than Tesseract and works very well even with data that isn’t prepared beforehand. There is also a product on the market, BlinkId, which is designed for reading data from ID cards, Passports, etc. If you have questions or would like to discuss this solution further, don’t be shy to reach out to me or leave me a comment.

The photos are from Google, if you are the author and want it to get removed email us.
© Copyright 2021