ocr | layer perceptron neural network | Machine Learning library

by mateogianolio JavaScript Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | ocr Summary

ocr is a JavaScript library typically used in Artificial Intelligence, Machine Learning, Deep Learning applications. ocr has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Trains a multi-layer perceptron (MLP) neural network to perform optical character recognition (OCR). The training set is automatically generated using a heavily modified version of the captcha-generator node-captcha. Support for the MNIST handwritten digit database has been added recently (see performance section). The network takes a one-dimensional binary array (default 20 * 20 = 400-bit) as input and outputs an 10-bit array of probabilities, which can be converted into a character code. Initial performance measurements show promising success rates.

Support

Quality

Security

License

Reuse

Support

ocr has a medium active ecosystem.

It has 1123 star(s) with 113 fork(s). There are 59 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 6 have been closed. On average issues are closed in 60 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of ocr is current.

Quality

ocr has 0 bugs and 0 code smells.

Security

ocr has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

ocr code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

ocr is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

ocr releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed ocr and discovered the below as its top functions. This is intended to give you an instant insight into ocr implemented functionality, and help decide if they suit your requirements.

Generate PNG image

Get all kandi verified functions for this library.

ocr Key Features

No Key Features are available at this moment for ocr.

ocr Examples and Code Snippets

No Code Snippets are available at this moment for ocr.

Community Discussions

Trending Discussions on ocr

Is Shannon-Fano coding ambiguous?

pytesseract improving OCR accuracy for blurred numbers on an image

Microsoft Computer Vision OCR Read API charged as S3 transaction instead of S2

Unable to authenticate service account - Google Cloud

After updating Gradle to 7.0.2, Element type “manifest” must be followed by either attribute specifications, “>” or “/>” error

InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [4], [batch]: [5] [Op:IteratorGetNext]

Mutate 2 vectors into a single vector in R

Parsing dates from OCRed files using dateparser library

OpenCV process all text to be black on white (segmentation)

Finding the right word and row in the Financial Statement text file

QUESTION

Is Shannon-Fano coding ambiguous?

Asked 2022-Mar-08 at 19:38

In a nutshell:

Is the Shannon-Fano coding as described in Fano's paper The Transmission of Information (1952) really ambiguous?

In Detail:

3 papers
Claude E. Shannon published his famous paper A Mathematical Theory of Communication in July 1948. In this paper he invented the term bit as we know it today and he also defined what we call Shannon entropy today. And he also proposed an entropy based data compression algorithm in this paper. But Shannon's algorithm was so weak, that under certain circumstances the "compressed" messages could be even longer than in fix length coding. A few month later (March 1949) Robert M. Fano published an improved version of Shannons algorithm in the paper The Transmission of Information. 3 years after Fano (in September 1952) his student David A. Huffman published an even better version in his paper A Method for the Construction of Minimum-Redundancy Codes. Hoffman Coding is more efficient than its two predecessors and it is still used today. But my question is about the algorithm published by Fano which usually is called Shannon-Fano-Coding.

The algorithm
This description is based on the description from Wikipedia. Sorry, I did not fully read Fano's paper. I only browsed through it. It is 37 pages long and I really tried hard to find a passage where he talks about the topic of my question, but I could not find it. So, here is how Shannon-Fano encoding works:

Count how often each character appears in the message.
Sort all characters by frequency, characters with highest frequency on top of the list
Divide the list into two parts, such that the sums of frequencies in both parts are as equal as possible. Add the bit 0 to one part and the bit 1 to the other part.
Repeat step 3 on each part that contains 2 or more characters until all parts consist of only 1 character.
Concatenate all bits from all rounds. This is the Shannon-Fano-code of that character.

An example
Let's execute this on a really tiny example (I think it's the smallest message where the problem appears). Here is the message to encode:

...

ANSWER

Answered 2022-Mar-08 at 19:00

To directly answer your question, without further elaboration about how to break ties, two different implementations of Shannon-Fano could produce different codes of different lengths for the same inputs.

As @MattTimmermans noted in the comments, Shannon-Fano does not always produce optimal prefix-free codings the way that, say, Huffman coding does. It might therefore be helpful to think of it less as an algorithm and more of a heuristic - something that likely will produce a good code but isn't guaranteed to give an optimal solution. Many heuristics suffer from similar issues, where minor tweaks in the input or how ties are broken could result in different results. A good example of this is the greedy coloring algorithm for finding vertex colorings of graphs. The linked Wikipedia article includes an example in which changing the order in which nodes are visited by the same basic algorithm yields wildly different results.

Even algorithms that produce optimal results, however, can sometimes produce different optimal results based on tiebreaks. Take Huffman coding, for example, which works by repeatedly finding the two lowest-weight trees assembled so far and merging them together. In the event that there are three or more trees at some intermediary step that are all tied for the same weight, different implementations of Huffman coding could produce different prefix-free codes based on which two they join together. The resulting trees would all be equally "good," though, in that they'd all produce outputs of the same length. (That's largely because, unlike Shannon-Fano, Huffman coding is guaranteed to produce an optimal encoding.)

That being said, it's easy to adjust Shannon-Fano so that it always produces a consistent result. For example, you could say "in the event of a tie, choose the partition that puts fewer items into the top group," at which point you would always consistently produce the same coding. It wouldn't necessarily be an optimal encoding, but, then again, since Shannon-Fano was never guaranteed to do so, this is probably not a major concern.

If, on the other hand, you're interested in the question of "when Shannon-Fano has to break a tie, how do I decide how to break the tie to produce the optimal solution?," then I'm not sure of a way to do this other than recursively trying both options and seeing which one is better, which in the worst case leads to exponentially-slow runtimes. But perhaps someone else here can find a way to do that>

Source https://stackoverflow.com/questions/71399572

QUESTION

pytesseract improving OCR accuracy for blurred numbers on an image

Asked 2022-Mar-02 at 22:12

Example of numbers

I am using the standard pytesseract img to text. I have tried with digits only option 90% of the time it is perfect but above is a example where it goes horribly wrong! This example produced no characters at all

As you can see there are now letters so language option is of no use, I did try adding some text in the grabbed image but it still goes wrong.

I increased the contrast using CV2 the text has been blurred upstream of my capture

Any ideas on increasing accuracy?

After many tests using the suggestions below. I found the sharpness filter gave unreliable results. another tool you can use is contrast=cv2.convertScaleAbs(img2,alpha=2.5,beta=-200) I used this as my text in black and white ended up light gray text on a gray background with convertScaleAbs I was able to increase the contrast to get almost a black and white image

Basic steps for OCR

Convert to monochrome
Crop image to your target text
Filter image to get black and white
perform OCR

...

ANSWER

Answered 2022-Feb-28 at 05:40

Here's a simple approach using OpenCV and Pytesseract OCR. To perform OCR on an image, it's important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. To do this, we can convert to grayscale, then apply a sharpening kernel using cv2.filter2D() to enhance the blurred sections. A general sharpening kernel looks like this:

Source https://stackoverflow.com/questions/71289347

QUESTION

Microsoft Computer Vision OCR Read API charged as S3 transaction instead of S2

Asked 2022-Jan-12 at 14:19

I am using Microsoft Computer Vision API for OCR processing and I noticed that they are getting charged as S3 transactions instead of S2 in my bill.

I'm using the .NET SDK and the API I am using is this one. https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.cognitiveservices.vision.computervision.computervisionclientextensions.readasync?view=azure-dotnet

I have also confirmed that the actual REST API the SDK calls is the following POST /vision/v3.2/read/analyze https://centraluseuap.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/5d986960601faab4bf452005

According to documentation, that should be the OCR Read API, am I correct? https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/vision-api-how-to-topics/call-read-api

I am puzzled as to why my calls are getting charged as S3 instead of S2. This is important for me because S3 is 50% more expensive than S2. Using the Pricing Calculator, 1000 S2 transactions is $1, whereas 1000 S3 transactions is $1.5. https://azure.microsoft.com/en-us/pricing/calculator/?service=cognitive-services

What's the difference between OCR and "Describe and Recognize Text" anyways? OCR (Optical Character Recognition) by definition must recognize text. I am calling the Read API without any of the optional parameters so I did not ask for "Describe" hence the call should be S2 feature rather than S3 feature I think.

I already posted this question at Microsoft Q&A but I thought SO might get more traffic hence help me get an answer faster. https://docs.microsoft.com/en-us/answers/questions/689767/computer-vision-api-charged-as-s3-transaction-inst.html

...

ANSWER

Answered 2022-Jan-12 at 14:19

To help you understand, you need a bit of history of those services. Computer Vision API (and all "calling" SDKs, whether C#/.Net, Java, Python etc using these APIs) have moved frequently and it is sometimes hard to understand which SDK calls which version of the APIs.

API operations history

Regarding optical character reading operations, there have been several operations:

Computer Vision 1.0

See definition here was containing:

OCR operation, a synchronous operation to recognize printed text
Recognize Handwritten Text operation, an asynchronous operation for handwritten text (with "Get Handwritten Text Operation Result" operation to collect the result once completed)

Computer Vision 2.0

See definition here. OCR was still there, but "Recognize Handwritten Text" was changed. So there were:

OCR operation, a synchronous operation to recognize printed text
Recognize Text operation, asynchronous (+ Get Recognize Text Operation Result to collect the result), accepting both printed or handwritten text (see mode input parameter)
Batch Read File operation, asynchronous (+ "Get Read Operation Result" to collect the result), which was also processing PDF files whereas the other one were only accepting images. It was intended "for text-heavy documents"

Computer Vision 2.1 was similar in terms of operations.

Computer Vision 3.0

See definition here. Main changes: Recognize Text and Batch Read File were "unified" into a Read operation, with models improvements. No more need to specify handwritten / printed for example (see link).

Source https://stackoverflow.com/questions/70657936

QUESTION

Unable to authenticate service account - Google Cloud

Asked 2022-Jan-11 at 10:55

I'll premise that I've already googled and read the documentation before writing, I've noticed that it's a popular discussion here on StackOverflow as well, but none of the answers already given have helped me.

I created a Google Cloud account to use the API: Google Vision.

To do this I followed the steps of creating the project, adding the above API and finally creating a service account with a key.

I downloaded the key and put it in a folder in the java project on the PC.

Then, since it is a maven project I added the dependencies to the pom as described in the tutorials.

At this point I inserted the suggested piece of code to start using the API.

Everything seemed to be OK, everything was read, the various libraries/interfaces were imported.

But an error came up as soon as I tried to run the program:

The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials.

I must admit I didn't know what 'Google Compute Engine' was, but since there was an alternative and I had some credentials, I wanted to try and follow that.

So I follow the instructions:

After creating your service account, you need to download the service account key to your machine(s) where your application runs. You can either use the GOOGLE_APPLICATION_CREDENTIALS environment variable or write code to pass the service account key to the client library.

OK, I tried the first way, to pass the credentials via environment variable:

With powershell -> no response

...

ANSWER

Answered 2022-Jan-10 at 17:56

Your approach is correct.

To authenticate code, you should use a Service Account.

Google provides a useful mechanism called Application Default Credentials (ADCs). See finding credentials automatically. When you use ADCs, Google's SDKs use a predefined mechanism to try to authenticate as the Service Account:

Checking GOOGLE_APPLICATION_CREDENTIALS in your environment. As you've tried;
When running on a GCP service (e.g. Compute Engine) by looking for the service's (Service Account) credentials. With Compute Engine, this is done by checking the so-called Metadata service.

For #1, you can either use GOOGLE_APPLICATION_CREDENTIALS in the process' environment or you can manually load the file as you appear to be trying in your code.

That all said:

I don't see where GoogleCredentials is being imported by your code?
Did you grant the Service Account a suitable role (permissions) so that it can access any other GCP services that it needs?

You should be able to use this List objects example.

The link above, finding credentials automatically, show show to create a Service Account, assign it a role and export it.

You will want to perhaps start (for development!) with roles/storage.objectAdmin (see IAM roles for Cloud Storage) and refine before deployment.

Source https://stackoverflow.com/questions/70656567

QUESTION

After updating Gradle to 7.0.2, Element type “manifest” must be followed by either attribute specifications, “>” or “/>” error

Asked 2021-Dec-29 at 11:19

So today I updated Android Studio to:

...

ANSWER

Answered 2021-Jul-30 at 07:00

Encountered the same problem. Update Huawei services. Please take care. Remember to keep your dependencies on the most up-to-date version. This problem is happening on Merged-Manifest.

Source https://stackoverflow.com/questions/68575710

QUESTION

InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [4], [batch]: [5] [Op:IteratorGetNext]

Asked 2021-Nov-24 at 13:26

Task: Keras captcha ocr model training.

Problem: I am trying to print CAPTCHAS from my validation set, but doing so is causing the following error

...

ANSWER

Answered 2021-Nov-24 at 13:26

Here is a complete running example based on your dataset running in Google Colab:

Source https://stackoverflow.com/questions/70091975

QUESTION

Mutate 2 vectors into a single vector in R

Asked 2021-Nov-22 at 01:23

I have a variable named Tactic (migratory tactics for a predatory species of fish) that contains three levels i.e "Migr", "OcRes", "EstRes" see data below:

...

ANSWER

Answered 2021-Nov-22 at 00:59

Do you mean this? I'm not sure that your sample data do not including many variables.

Source https://stackoverflow.com/questions/70059841

QUESTION

Parsing dates from OCRed files using dateparser library

Asked 2021-Nov-07 at 17:54

I want to extract dates from OCR images using the dateparser lib.

...

ANSWER

Answered 2021-Nov-07 at 17:42

The problem is that your date is a match data object. Also, I am not sure dateparser.parse does what you need. I'd recommend datefinder package to extract dates from text.

This is the regex I'd use:

Source https://stackoverflow.com/questions/69696980

QUESTION

OpenCV process all text to be black on white (segmentation)

Asked 2021-Nov-03 at 19:34

Is it possible to somehow make it so that all text in a document is black on white after thresholding. I've been looking online alot but I haven't been able to come to a solution. My current thresholded image is: https://i.ibb.co/Rpqcp7v/thresh.jpg

The document needs to be read by an OCR and for that I need to have the areas that are currently white on black, to be inverted. How would I go about doing this? my current code:

...

ANSWER

Answered 2021-Nov-03 at 19:34

Use a median filter to estimate the dominant color (background).

Then subtract the image from that... you'll get white text on black background. I'm using the absolute difference. Invert for black on white.

Source https://stackoverflow.com/questions/69826447

QUESTION

Finding the right word and row in the Financial Statement text file

Asked 2021-Nov-02 at 07:51

I've used tesseract OCR in Python to convert the Financial statement pdfs to text files, while converting the long whitespaces into ";". So the text file looks pretty nice and the tables are looking good.

Using an example found here https://cdn.corporatefinanceinstitute.com/assets/AMZN-Cash-Flow.png

...

ANSWER

Answered 2021-Oct-29 at 22:03

Use words to look for with whitespace characters, regular expression can be create in the function with .* joiners to allow matching anything between those words.

Source https://stackoverflow.com/questions/69766452

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install ocr

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: