Everyday AI For Engineers: Deep Dive into Optical Character Recognition

If you’ve been following us on social media, you’re already aware that Subco Engineering has been diving into the exciting world of AI, discovering how these cutting-edge tools can enhance the daily work of Mechanical Engineers.

At first glance, “AI” might seem a bit mysterious—like a “black box” where we feed in data, and the magic happens behind the scenes. But for engineers accustomed to dealing with complex, integrated systems, it’s reassuring to break down this technology into manageable, well-ordered steps and (relatively) simple algorithms.

In this blog, we’ll take you on a deeper dive into the technology used on our first feature “Harnessing AI for Laborious Tasks” and our plans for the future for applying this technology to the world of Mechanical Engineering.

Lets dive into our first feature…

🔶 Feature 01: Harnessing AI for laborious tasks!

This simple example of using CHAT GPT’’s Optical Character Recognition feature combined with Pythons ‘pandas ‘library to organise the data into a table, means you can take a long list of handwritten information and turn it into an excel table within seconds.

The magic behind Feature 01…

This feature used Optical Character Recognition (OCR), a technology that allows software to convert different types of documents, such as scanned paper documents, PDFs, or images, into editable and searchable data. OCR can be broken down into four steps:

Step 1- Pre-processing the Image

Grayscale Conversion: The image is converted to grayscale, as coloured or noisy images may confuse the OCR process.

Noise Removal: Unwanted noise or irregularities (such as dust or faint marks) are filtered out.

Binarization: This process converts the grayscale image into a binary image (black-and-white), where text appears as dark pixels, and the background is white.

Skew Correction: If the document is slightly tilted, the text alignment is corrected so that OCR can recognise the characters correctly.

Step 2 – Segmentation

What happens: Once the image is cleaned up, the next step is to segment the text. Segmentation breaks the image into smaller, recognisable parts, such as: “text lines” and “Words” and “Characters”

Step 3 – Character Recognition (The core of OCR)

What happens: Now, the OCR algorithm attempts to recognise each individual character. This is the heart of OCR and involves two key techniques:

Feature Extraction: Instead of matching whole characters, the algorithm breaks down each character into individual features, such as curves, angles, and lines, and uses these features to recognise the character. This technique can be more flexible in handling variations in fonts and handwriting.

Pattern Recognition: This method compares the shapes of characters in the image to pre-defined patterns stored in the OCR system’s database. If a match is found, the character is identified.

OCRs Algorithms

So far we have identified, what Optical Character Recognition is and what it does, but to really understand AI we have to dig a level deeper and ask how it does this. Here is where we learn of the range of Algorithms used by AI to return successful results…

Making things Binary: Otsu’s Binarization Algorithm

What it does: This algorithm helps convert a grayscale image into a binary image by choosing an optimal threshold value that separates the text (foreground) from the background. The algorithm works by minimising the variance within each class (text and background) and is commonly used in the pre-processing stage of OCR.

Grouping words, lines and paragraphs: Connected Component Analysis (CCA)

What it does: This technique is used for segmenting the image into smaller parts. It scans the binary image to identify connected groups of pixels, which are often characters, words, or even entire paragraphs. These components are then passed on for recognition.

The magic: Convolutional Neural Networks (CNNs)

What they do: CNNs are the backbone of modern OCR systems. They are designed to mimic how humans recognise objects by processing an image through several layers, each identifying different features (edges, curves, etc.). CNNs are particularly effective in recognising characters because they can learn from large datasets and handle variations in fonts, handwriting, and image quality.

The best way to think about a CNNs are a series of filters that look for “patterns” in the binary code. Each filter applied to the image, exaggerates certain parts of the image if it fits the filter pattern. For example, the handwritten “7” below, has 2 filters applied to it, the first exaggerates the pixels that are patterned as horizontal lines. The second filter, exaggerates the pixels with a vertical line pattern etc. Combining the results of just these two filters, produces a result pretty close to a 7.

In modern AI image recognition software, the CNNs can have many thousands of filter layers. With each layer getting more complex. Filters exaggerating circles and squares, quickly become filters exaggerating trees, which quickly become filters exaggerating facial features… even facial features of cats.

Step 4 – Adding Some Intelligence

An AI platform uses a combination of these Algorithms to generate a first pass at character recognition, it then cross checks these against known words from a dictionary database of many languages. It returns a “confidence score”, where it marks its own attempt. Any low scores are re-ran using a different combination of algorithms until the score improves – this process is repeated.

Once the characters have a high score and are recognised as words, they are compared against AI’s knowledge of English language syntax, context and large language models to further refine the conversion into readable text.

So that is Optical Character Recognition (OCR) broken down into a four step process supported by a range of algorithms at AIs finger tips.

Engineering Applications of OCR

Having an understanding of how AI interprets and understands the data we give to it enables us to tailor the inputs to maximise the success of working with AI tools. So how do we apply this to engineering?

One of the key ways we can see AI and OCR being paired up in Engineering is through the checking of engineering drawings through an AI tools. At Subco Engineering we have started to develop a “Check Bot.” This is a custom GPT that is trained on some high level rules for engineering drawings. For example:

Comparing the drawing Bill of Material (BOM) against a predefined BOM format and returning any errors.
Checking the revision table format and values
Checking if all items on a drawing have Balloon IDs that align with BOM values.

These checks do not require an engineering degree to complete, they are simple formatting checks, where we compare one set of results against a pre-defined value. The idea of the Subco “Check Bot” is to complete these high levels checks freeing up more time for engineers to conduct more involved checks (tolerancing, interfaces, fabrication considerations).

So how does Subco’s “Check Bot” perform?

Currently, not very well.

It spotted the “Total Mass = N/A” when it should have the assembly mass.
It spotted that some balloon items were missing.

So why can AI recognise a Golden Retriever from an Alsatian, but can’t see anything on our drawing?

The reason is the OCR, sweeps through the drawing, recognises characters and numbers but completely overlooks the lines and illustrations. The OCR just returns a string of unformatted text, completely separated from its relationship with the drawing. Without the OCR feeding the AI any information on the actual lines on the drawing, the AI can only comment or review the extracted text.

When submitting a technical drawing, the raw information communicated to AI is shown below – just characters and numbers. AI then reviews, formats and categorises this information, and if not controlled, it will make assumptions to fill in any missing data it needs.

BEAM FLIPPER MECHANICAL ASSEMBLY SECTION B-B SCALE 1 : 5 SECTION D-D SCALE 1 : 5 SECTION E-E SCALE 1 : 5 SECTION A-A SCALE 1 : 5 SECTION C-C SCALE 1 : 5 PARTS LIST MASS MATERIAL DESCRIPTION PART NUMBER QTY ITEM 95.00 kg SEE DRAWING HYDRAULIC CYLINDER 140 BORE x 70 ROD. 700 STROKE P1106-ASSY-001 2 1 1078.30 kg SEE DRAWING FAB MAIN FRAME PRIMARY P1106-FAB-001 1 2 335.46 kg SEE DRAWING FAB ROTATING ARM PRIMARY P1106-FAB-002 2 3 3.63 kg SS 316L PIN 60x165 P1106-PRT-010 2 4 4.02 kg SS 316L PIN 60x175 P1106-PRT-011 2 5 0.26 kg SS 316L LP70 P1106-PRT-012 2 6 0.14 kg SS 316L LP50 P1106-PRT-013 2 7 7.18 kg SS 316L PIN 70x240 P1106-PRT-015 2 8 0.18 kg Nylon 6/6 NYLON WASHER 160 dia. P1106-PRT-017 1 9 0.03 kg Nylon 6/6 NYLON WASHER 100 dia. P1106-PRT-018 12 10 5.43 kg Nylon 6 NYLON PAD P1106-PRT-019 4 11 0.27 kg AS SUPPLIED BEARING BUSH 7060DU 2 12 0.15 kg AS SUPPLIED BEARING BUSH 6040DU 2 13 2.47 kg AS SUPPLIED TIPPER PAD 180L x 75W x 34H - POLYMAX 8 14 0.00 kg Stainless Steel WASHER - ISO 7089- FORM A - M8 32 15 0.00 kg Stainless Steel WASHER - ISO 7089- FORM A - M10 40 16 0.01 kg A2 HEX HEAD NUT - ISO 4032 - M8 16 17 0.01 kg A2 HEX HEAD NUT - ISO 4032 - M10 32 18 0.02 kg A2-70 HEX HEAD BOLT - ISO 4018 - M8 x 30 16 19 0.03 kg A2-70 HEX HEAD BOLT - ISO 4018 - M10 x 30 4 20 0.04 kg A2-70 HEX HEAD BOLT - ISO 4018 - M10 x 40 4 21 0.04 kg A2 HEX HEAD BOLT - ISO 4018 - M10 x 45 3 22 0.04 kg Stainless Steel SPLIT PIN - ISO 1234 - 8 x 80 2 23 B B D D E E A A C C 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 A A B B C C D D E E F F G G H H DRAWING TITLE: DRAWING No: www.subco-engineering.com Sheet 1 of 1 Scale 1 : 20 Dimensions mm A1 3rd ANGLE MECHANICAL ASSEMBLY P1106-ASSY-002 REVISION TABLE REV DESCRIPTION DATE AUTHOR CHECKED APPROVED 00 FOR MANUFACTURE 25/07/2024 JD RW MR TOTAL MASS = N/A ( 1 5 7 6 ) ( 3 1 7 0 ) 1 0 3 0 ( 6 9 1 ) (3877) 3 2 22 16 18 2116 5 3 2 20 16 9 8 ( 4 0 ) 6 10 (20) ( 2 3 5 0 ) ( 1 2 2 2 ) C L O S E D C E N T R E S - (1 9 1 4) O P E N C E N T R E S - 23 (1905) (3170) (2860) NOTES: 1. FABRICATION GUARDING HIDDEN FOR CLARITY. 2. FASTENERS INSTALLED AS PER A RECOGNISED INDUSTRY STANDARD INSTALLATION TORQUE AND LUBRICATION METHOD. DRILL TO SUIT THROUGH STEEL 3 1 4 7 11 12 13 14 19 17 15

There are currently no “filters” in the CNNs looking for the “pattern” of bolts, or the “pattern” of a Hydraulic Cylinder. No filter looking for pin holes and pad eyes etc.

So for now, AI checking of engineering drawings is largely limited to a spell check of text and perhaps some high level formatting checks.

But here at Subco Engineering, we don’t just stop at the first hurdle. We are currently investigating developing a custom CNN with predefined engineering “filters”, that will open the door to communicating with the power of AI – if only we can tell it what it’s looking at!

Subco Engineering

Everyday AI For Engineers: Deep Dive OCR

Everyday AI For Engineers: Deep Dive into Optical Character Recognition

🔶 Feature 01: Harnessing AI for laborious tasks!

The magic behind Feature 01…

Step 1- Pre-processing the Image

Step 2 – Segmentation

Step 3 – Character Recognition (The core of OCR)

OCRs Algorithms

Step 4 – Adding Some Intelligence

Engineering Applications of OCR

Like this:

Related

Subco Engineering

Next PostSummer Internship

Leave a ReplyCancel reply

Everyday AI For Engineers: Deep Dive OCR

Everyday AI For Engineers: Deep Dive into Optical Character Recognition

🔶 Feature 01: Harnessing AI for laborious tasks!

The magic behind Feature 01…

Step 1- Pre-processing the Image

Step 2 – Segmentation

Step 3 – Character Recognition (The core of OCR)

OCRs Algorithms

Step 4 – Adding Some Intelligence

Engineering Applications of OCR

Share this:

Like this:

Related

Subco Engineering

Next PostSummer Internship

Leave a ReplyCancel reply

Discover more from Subco Engineering