tesseract-ocr.github.io | Tesseract documentation | Computer Vision library

by tesseract-ocr Ruby Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(5)Vulnerabilities Install Support

kandi X-RAY | tesseract-ocr.github.io Summary

tesseract-ocr.github.io is a Ruby library typically used in Artificial Intelligence, Computer Vision applications. tesseract-ocr.github.io has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Tesseract documentation

Support

Quality

Security

License

Reuse

Support

tesseract-ocr.github.io has a low active ecosystem.

It has 74 star(s) with 59 fork(s). There are 21 watchers for this library.

It had no major release in the last 6 months.

tesseract-ocr.github.io has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of tesseract-ocr.github.io is current.

Quality

tesseract-ocr.github.io has 0 bugs and 0 code smells.

Security

tesseract-ocr.github.io has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

tesseract-ocr.github.io code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

tesseract-ocr.github.io is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

tesseract-ocr.github.io releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of tesseract-ocr.github.io

Get all kandi verified functions for this library.

tesseract-ocr.github.io Key Features

No Key Features are available at this moment for tesseract-ocr.github.io.

tesseract-ocr.github.io Examples and Code Snippets

No Code Snippets are available at this moment for tesseract-ocr.github.io.

Community Discussions

Trending Discussions on tesseract-ocr.github.io

Packaging Tesseract 5 with VCPKG

Detecting white text on a bright background with tesseract

Using Tesseract to read dates from a small images

How to generate lstmf from .box and .tif files in tesseract 5 alpha lstm training

how to convert C++ tesseract-ocr code to Python?

QUESTION

Packaging Tesseract 5 with VCPKG

Asked 2022-Feb-15 at 06:39

I'm following the instructions at https://tesseract-ocr.github.io/tessdoc/Compiling.html#windows and when I run: vcpkg install tesseract:x86-windows-static It is pulling down tesseract 4. I tried using -head and it still pulls down 4. Any idea how I can build a self-contained executable for tesseract 5.x?

...

ANSWER

Answered 2022-Feb-15 at 06:39

At the moment vcpkg support version 4.1.1: https://vcpkg.info/port/tesseract

There is request for update: https://github.com/microsoft/vcpkg/issues/16019 from Feb 3, 2021 which Microsoft ignores ;-)

You can (manually) upgrade tesseract version in vcpkg. See tesseract forum discussion: https://groups.google.com/g/tesseract-ocr/c/2xAJaGRqymw?pli=1

Source https://stackoverflow.com/questions/71105979

QUESTION

Detecting white text on a bright background with tesseract

Asked 2021-May-05 at 01:11

I'm having issues reading white text on a bright background, it finds the text itself but it cannot really translate it correctly.

The image:

The result I keep getting is LanEerus which is not that far off, to be honest.

What I'm wondering is what image pre-processing could fix this? I'm using photoshop to manually pre-process it before I try to do it with code, to find what should work first.

I've tried making it a bitmap, but that makes the borders of the text pretty bad, resulting in tesseract just translating it to random characters.

Inverting colors and/or grayscaling doesn't seem to do the trick, either.

Anyone have any ideas? I know it's a pretty bad background for the text for this case. Trust me, I wish that the background was different!

My code for the tests:

...

ANSWER

Answered 2021-May-05 at 01:11

Here's one possible solution. This is in Python, but it should be clear enough for a Java port. We will apply a method called gained division. The idea is that you try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image. This will get rid of most of the background color variation. We can use a morphological chain to clean the result a little bit, let's see the code:

Source https://stackoverflow.com/questions/67386714

QUESTION

Using Tesseract to read dates from a small images

Asked 2021-Mar-29 at 15:53

I have a rather small set of images which contains dates. The size might be a problem, but I'd say that the quality is OK. I have followed the guidelines to provide the clearest image I can to the engine. After resizing, apply filters, lots of trial and error, etc. I came up with an image that is almost properly read. I put an example below:

Now, this is read as “9 MAR 2021\n\x0c. Not bad, but the first 2 is read as ". At this point I think I'm misusing part of the power of Tesseract. After all, I know what it should expect, i.e. something as "%d %b %Y".

Is there a way to tell Tesseract that it should try to find the best match given this strong constraint? Providing this metadata to the engine should heavily facilitate the task. I have been reading the documentation, but I can't find the way to do this.

I'm using pytesseract on Tesseract 4.1. with Pytyon 3.9.

...

ANSWER

Answered 2021-Mar-29 at 15:53

You need to know the followings:

Now if we center the image (by adding borders):

We up-sample the image without losing any pixel.

Second, we need to make the characters in the image bold to make the OCR result accurate.

Now OCR:

Source https://stackoverflow.com/questions/66856172

QUESTION

How to generate lstmf from .box and .tif files in tesseract 5 alpha lstm training

Asked 2020-Mar-08 at 10:32

I am using the current alpha version 5 of tesseract. Currently, I am trying to train using images without font files. I managed to generate box files from the image using the following command.

...

ANSWER

Answered 2020-Mar-08 at 10:32

Found it,

Source https://stackoverflow.com/questions/60524751

QUESTION

how to convert C++ tesseract-ocr code to Python?

Asked 2020-Feb-12 at 07:06

I want to convert the C++ version Result iterator example in tesseract-ocr doc to Python.

...

ANSWER

Answered 2020-Feb-11 at 15:21

I think the problem is that api->Recognize() expects a pointer as first argument. They mistakenly put a 0 in their example but it should be nullptr. 0 and nullptr both have the same value but on 64bits systems they don't have the same size (usually ; I assume on some weird non-x86 systems this may not be true either).

Their example still works with a C++ compiler because the compiler is aware that the function expects a pointer (64bits) and fix it silently.

In your example, it seems you haven't specified the exact prototype of TessBaseAPIRecognize() to ctypes. So ctypes can't know a pointer (64 bits) is expected by this function. Instead it assumes that this function expects an integer (32 bits) --> it crashes.

My suggestions:

Use ctypes.c_void_p(None) instead of 0
If you intend to use that in production, specify to ctypes all the function prototypes
Be careful with the examples you look at: Those examples use Tesseract base API (C++ API) whereas if you want to use libtesseract with Python + ctypes, you have to use Tesseract C API. Those 2 APIs are very similar but may not be identical.

If you need further help, you can have a look at how things are done in PyOCR. If you decide to use PyOCR in your project, just beware that the license of PyOCR is GPLv3+, which implies some restrictions.

Source https://stackoverflow.com/questions/60166781

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install tesseract-ocr.github.io

You can download it from GitHub.
On a UNIX-like operating system, using your system’s package manager is easiest. However, the packaged Ruby version may not be the newest one. There is also an installer for Windows. Managers help you to switch between multiple Ruby versions on your system. Installers can be used to install a specific or multiple Ruby versions. Please refer ruby-lang.org for more information.