ocropus | OSX-buildable fork

by jkrall C++ Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | ocropus Summary

ocropus is a C++ library typically used in macOS applications. ocropus has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

OSX-buildable fork of ocropus 0.4

Support

Quality

Security

License

Reuse

Support

ocropus has a low active ecosystem.

It has 16 star(s) with 10 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

There are 2 open issues and 0 have been closed. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of ocropus is current.

Quality

ocropus has no bugs reported.

Security

ocropus has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

ocropus does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

ocropus releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of ocropus

Get all kandi verified functions for this library.

ocropus Key Features

No Key Features are available at this moment for ocropus.

ocropus Examples and Code Snippets

No Code Snippets are available at this moment for ocropus.

Community Discussions

Trending Discussions on ocropus

Cannot pip install with python 2,7 virtual environment

How to find paragraph bounding box coordinates in a scanned document?

QUESTION

Cannot pip install with python 2,7 virtual environment

Asked 2020-May-20 at 05:06

I am trying to use the following OCR project that is found here on github. I am using python 3 virtual environment. I am on Windows. I installed successfully requirements.txt using Python 3.6.7, however when I am attempting to do python install setup.py I get the following error:

...

ANSWER

Answered 2020-May-19 at 18:21

Read your error again, and you will see this at the 2nd line of your error:

Source https://stackoverflow.com/questions/61894286

QUESTION

How to find paragraph bounding box coordinates in a scanned document?

Asked 2019-Jun-21 at 23:56

I'd like to get the coordinates of all areas containing any text in scans of documents like the one shown below (in reduced quality; the original files are of high resolution):

I'm looking for something similar to these (GIMP'ed-up!) bounding boxes. It's important to me that the paragraphs be recognized as such. If the two big blocks (top box on left page, center block on right page) would get two bounding boxes each, though, that would be fine:

The way of obtaining these bounding box coordinates could be through some kind of API (scripted languages preferred over compiled ones) or through a command line command, I don't care. What's important is that I get the coordinates themselves, not just a modified version of the image where they're visible. The reason for that is that I need to calculate the area size of each one of them and then cut out a piece at the center of the largest.

What I've already tried, so far without success:

ImageMagick - it's just not meant for such a task
OpenCV - either the learning curve is too high or my google-foo too bad
Tesseract - from what I've been able to garner, it's the one-off OCR software that, for historical reasons, doesn't do Page Layout Analysis before attempting character shape recognition
OCRopus/OCRopy - should be able to do it, but I'm not finding out how to tell it I'm interested in paragraphs as opposed to words or characters
Kraken ibn OCRopus - a fork of OCRopus with some rough edges, still fighting with it
Using statistics, specifically, a clustering algorithm (OPTICS seems to be the one most appropriate for this task) after binarization of the image - both my maths and coding skills are insufficient for it

I've seen images around the internet of document scans being segmented into parts containing text, photos, and other elements, so this problem seems to be one that has academically already been solved. How to get to the goodies, though?

...

ANSWER

Answered 2019-Jun-21 at 23:56

In Imagemagick, you can threshold the image to keep from getting too much noise, then blur it and then threshold again to make large regions of black connected. Then use -connected-components to filter out small regions, especially of white and then find the bounding boxes of the black regions. (Unix bash syntax)

Source https://stackoverflow.com/questions/52705219

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install ocropus

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: