tesseract-ocr | package contains the Tesseract Open Source OCR Engine | Computer Vision library

by jimregan C++ Version: Current License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | tesseract-ocr Summary

tesseract-ocr is a C++ library typically used in Artificial Intelligence, Computer Vision applications. tesseract-ocr has no bugs, it has no vulnerabilities and it has low support. However tesseract-ocr has a Non-SPDX License. You can download it from GitHub.

This package contains the Tesseract Open Source OCR Engine. Orignally developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado, all the code in this distribution is now licensed under the Apache License:.

Support

Quality

Security

License

Reuse

Support

tesseract-ocr has a low active ecosystem.

It has 12 star(s) with 2 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

tesseract-ocr has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of tesseract-ocr is current.

Quality

tesseract-ocr has no bugs reported.

Security

tesseract-ocr has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

tesseract-ocr has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

tesseract-ocr releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of tesseract-ocr

Get all kandi verified functions for this library.

tesseract-ocr Key Features

No Key Features are available at this moment for tesseract-ocr.

tesseract-ocr Examples and Code Snippets

No Code Snippets are available at this moment for tesseract-ocr.

Community Discussions

Trending Discussions on tesseract-ocr

How to improve Hindi text extraction?

How to remove text from the sketched image

Could not find a package configuration file provided by "Leptonica"

Error when running python script in node js with python-shell npm

How to OCR a text with white colour characters on a blue background from a cropped image?

Detecting white text on a bright background with tesseract

Open CV OCR improve data extraction from color image with background

Performing OCR of Seven Segment Display images

unable to get text from the image

watchdog.observers.Observer works in Windows, works in docker on Linux, does not work in docker on Windows

QUESTION

How to improve Hindi text extraction?

Asked 2021-Jun-11 at 20:13

I am trying to extract Hindi text from a PDF. I tried all the methods to exract from the PDF, but none of them worked. There are explanations why it doesn't work, but no answers as such. So, I decided to convert the PDF to an image, and then use pytesseract to extract texts. I have downloaded the Hindi trained data, however that also gives highly inaccurate text.

That's the actual Hindi text from the PDF (download link):

That's my code so far:

...

ANSWER

Answered 2021-Jun-08 at 14:46

It seems the module pdfplumber does the work:

Source https://stackoverflow.com/questions/67816185

QUESTION

How to remove text from the sketched image

Asked 2021-Jun-10 at 04:07

I have some sketched images where the images contain text captions. I am trying to remove those caption.

I am using this code:

...

ANSWER

Answered 2021-Jun-09 at 20:15

The cv2 pre-processing is unecessary here, tesseract is able to find the text on its own. See the example below, commented inline:

Source https://stackoverflow.com/questions/67910691

QUESTION

Could not find a package configuration file provided by "Leptonica"

Asked 2021-Jun-07 at 18:55

I am trying to generate a visual studio 2019 C++ project from the tesseract 4.1.1 source code. Ultimately, I want to include a tesseract C++ project in my custom solution that consumes OCR results.

When I follow these steps:

Download and extract tesseract code https://github.com/tesseract-ocr/tesseract/archive/refs/tags/4.1.1.zip to "C:\tesseract" directory.
Execute the following commands in a Developer Command Prompt for VS 2019:

C:\Windows\System32>cd "C:\tesseract"
C:\tesseract>mkdir build
C:\tesseract>cd build
C:\tesseract\build>cmake ..

I receive this error:

...

ANSWER

Answered 2021-Jun-05 at 07:13

There are several tutorial how to build tesseract on windows with cmake and VS e.g. https://bucket401.blogspot.com/2021/03/building-tesserocr-on-ms-windows-64bit.html (you can ignore end of tutorial - python module), minimalist tesseract or with clang

Source https://stackoverflow.com/questions/67839925

QUESTION

Error when running python script in node js with python-shell npm

Asked 2021-May-16 at 17:24

I am developing a web application which has image processing functions. So I used opencv-python and implemented the python script to node js using python-shell package,

index.js;

...

ANSWER

Answered 2021-May-16 at 17:24

I solved the error by giving the full path of the image in the python script to imread()

Source https://stackoverflow.com/questions/67147130

QUESTION

How to OCR a text with white colour characters on a blue background from a cropped image?

Asked 2021-May-06 at 10:37

First, I want to crop an image using a mouse event, and then print the text inside the cropped image. I tried OCR scripts but all can't work for this image attached below. I think the reason is that the text has white characters on blue background.

Can you help me with doing this?

Full image:

Cropped image:

An example what I tried is:

...

ANSWER

Answered 2021-May-06 at 10:37

[EDIT]

For anyone wondering, the image in the question was updated after posting my answer. That was the original image:

Thus, the below output in my original answer.

That's the newly posted image:

The specific Turkish characters, especially in the last word, are still not properly detected (since I still can't use lang='tur' right now), but at least the Ö and Ü can be detected using lang='deu', which I have installed:

Source https://stackoverflow.com/questions/67410136

QUESTION

Detecting white text on a bright background with tesseract

Asked 2021-May-05 at 01:11

I'm having issues reading white text on a bright background, it finds the text itself but it cannot really translate it correctly.

The image:

The result I keep getting is LanEerus which is not that far off, to be honest.

What I'm wondering is what image pre-processing could fix this? I'm using photoshop to manually pre-process it before I try to do it with code, to find what should work first.

I've tried making it a bitmap, but that makes the borders of the text pretty bad, resulting in tesseract just translating it to random characters.

Inverting colors and/or grayscaling doesn't seem to do the trick, either.

Anyone have any ideas? I know it's a pretty bad background for the text for this case. Trust me, I wish that the background was different!

My code for the tests:

...

ANSWER

Answered 2021-May-05 at 01:11

Here's one possible solution. This is in Python, but it should be clear enough for a Java port. We will apply a method called gained division. The idea is that you try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image. This will get rid of most of the background color variation. We can use a morphological chain to clean the result a little bit, let's see the code:

Source https://stackoverflow.com/questions/67386714

QUESTION

Open CV OCR improve data extraction from color image with background

Asked 2021-Apr-28 at 10:22

I am trying to extract some info from mobile screen shots. Though my code is able to retrieve some info , but not all of it. I read the image converted to grey , then removed non required parts and applied Gaussian Threshold. But the entire text is not getting read.

...

ANSWER

Answered 2021-Apr-28 at 10:22

Have a look at the page segmentation modes of pytesseract, cf. this Q&A. For example, using config='-psm 12' will already give you all desired texts. Nevertheless, those graphs are also somehow interpreted as texts.

That's why I would preprocess the image to get single boxes (actual texts, the graphs, those information from the top, etc.), and filter to only store those boxes with the content of interest. That could be done by using

the y coordinate of the bounding rectangle (not in the upper 5 % of the image, that's the mobile phone status bar),
the width w of the bounding rectangle (not wider than 50 % of the image' width, these are the horizontal lines),
the x coordinate of the bounding rectangle (not in middle third of the image, these are the graphs).

What's left is to run pytesseract on each cropped image with config='-psm 6' for example (assume a single uniform block of text), and clean the texts from any line breaks.

That'd be my code:

Source https://stackoverflow.com/questions/67187438

QUESTION

Performing OCR of Seven Segment Display images

Asked 2021-Apr-19 at 06:01

I'm working on performing OCR of energy meter displays: example 1 example 2 example 3

I tried to use tesseract-ocr with the letsgodigital trained data. But the performance is very poor.

I'm fairly new to the topic and this is what I've done:

...

ANSWER

Answered 2021-Apr-19 at 06:01

Notice how your power meters either use blue or green LEDs to light up the display; I suggest you use this color display to your advantage. What I'd do is select only one RGB channel based on the LED color. Then I can threshold it based on some algorithm or assumption. After that, you can do the downstream steps of cropping / resizing / transformation / OCR etc.

For example, using your example image 1, look at its histogram here. Notice how there is a small peak of green to the right of the 150 mark.

I take advantage of this, and set anything below 150 to zero. My assumption being that the green peak is the bright green LED in the image.

Source https://stackoverflow.com/questions/67146380

QUESTION

unable to get text from the image

Asked 2021-Apr-18 at 06:54

I'm learning AI/ML and trying to get text from this sample form.

...

ANSWER

Answered 2021-Apr-18 at 06:54

This link provides me the answer. Its removing the noise in the background image.

Source https://stackoverflow.com/questions/67106600

QUESTION

watchdog.observers.Observer works in Windows, works in docker on Linux, does not work in docker on Windows

Asked 2021-Apr-10 at 01:11

I have an interesting problem that is driving me nuts. I have a python program that is using watchdog.observers.Observer. This program (aka watcher) watches a folder and responds when files appear in it. I have another program (aka parser) which periodically populates the watched folder with files.

When the watcher program runs in Windows and the parser runs in a docker container on Windows, there is happiness.
When the watcher program runs in a docker container on a Linux box and the parser runs in another docker container on the Linux box, there is happiness.
When the watcher program runs in a docker container on Windows and the parser runs in another docker container on Windows, happiness is not achieved. The parser populates the folder with files, but the watcher never observes them.

Here's my watcher code:

...

ANSWER

Answered 2021-Apr-10 at 01:11

The underlying API that watchdog uses to monitor linux filesystem events is called inotify. The Docker for Windows WSL 2 backend documentation notes:

Linux containers only receive file change events (“inotify events”) if the original files are stored in the Linux filesystem.

The directory you're mounting, c:\My_MR, resides on the Windows file system and thus inotify inside the watcher container doesn't work.

Instead, you can run docker from inside your WSL 2 default distribution with a linux filesystem path, e.g., ~/my_mr:

Source https://stackoverflow.com/questions/66909254

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install tesseract-ocr

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: