tesserocr | A Python wrapper for the tesseract-ocr API | Computer Vision library

 by   sirfz Python Version: 2.7.0 License: MIT

kandi X-RAY | tesserocr Summary

kandi X-RAY | tesserocr Summary

tesserocr is a Python library typically used in Artificial Intelligence, Computer Vision applications. tesserocr has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can install using 'pip install tesserocr' or download it from GitHub, PyPI.

A Python wrapper for the tesseract-ocr API
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              tesserocr has a medium active ecosystem.
              It has 1786 star(s) with 246 fork(s). There are 55 watchers for this library.
              There were 2 major release(s) in the last 12 months.
              There are 74 open issues and 179 have been closed. On average issues are closed in 287 days. There are 4 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of tesserocr is 2.7.0

            kandi-Quality Quality

              tesserocr has 0 bugs and 0 code smells.

            kandi-Security Security

              tesserocr has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              tesserocr code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              tesserocr is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              tesserocr releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              tesserocr saves you 176 person hours of effort in developing the same functionality from scratch.
              It has 436 lines of code, 37 functions and 3 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed tesserocr and discovered the below as its top functions. This is intended to give you an instant insight into tesserocr implemented functionality, and help decide if they suit your requirements.
            • Return an Extension instance
            • Returns tesseract version number
            • Run tesseract
            • Convert a version string to an integer
            • Get build arguments
            • Return a list of paths that match the pattern
            • Return the major version
            • Find version string
            • Read file contents
            • Convert a version number to an integer
            Get all kandi verified functions for this library.

            tesserocr Key Features

            No Key Features are available at this moment for tesserocr.

            tesserocr Examples and Code Snippets

            FabBits,Requirements
            Pythondot img1Lines of Code : 12dot img1no licencesLicense : No License
            copy iconCopy
            pip3 install scipy
            pip3 install opencv-python
            pip3 install moviepy
            pip3 install pyqt5
            pip3 install Pillow
            pip3 install tesserocr
            
            conda install -c conda-forge scipy
            conda install -c conda-forge opencv
            conda install -c conda-forge moviepy
            conda instal  
            TiayanchaAuto
            Pythondot img2Lines of Code : 9dot img2no licencesLicense : No License
            copy iconCopy
                list_match=['announcementcourt','lawsuit','court','zhixing','dishonest','courtRegister' ,\
                            'abnormal','punish','equity','equityPledgeRatio','equityPledgeDetail','judicialSale',\
                            'publicnoticeItem','environmentalPen  
            :construction: Install,Install Frappe application
            Pythondot img3Lines of Code : 2dot img3License : Non-SPDX (NOASSERTION)
            copy iconCopy
            bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
            bench install-app erpnext_ocr
              

            Community Discussions

            QUESTION

            Build tesseract as DLL Dynamic link library
            Asked 2022-Feb-03 at 11:26

            I'm using this .NET wrapper https://github.com/charlesw/tesseract and I wanted to update the included tesseract and leptonica DLLs but after a long google search I was not able to generate them from the original tesseract and leptonica github repositories.

            I already ask on the charlesw repository but did not get any reply (https://github.com/charlesw/tesseract/issues/486).

            Any help on how to build the DLLs is much appreciated.

            Thanks!

            https://github.com/tesseract-ocr/tesseract https://github.com/danbloomberg/leptonica

            Answer : (thank you user898678 for the link) Using bucket401 blog post tutorial I extracted the required part to generate:

            • leptonica-X.XX.X.dll
            • tesseract.exe
            • tesseractXX.dll

            and created this buildTesseractLeptonica.bat :

            ...

            ANSWER

            Answered 2022-Feb-03 at 10:08

            QUESTION

            Could not find a package configuration file provided by "Leptonica"
            Asked 2021-Jun-07 at 18:55

            I am trying to generate a visual studio 2019 C++ project from the tesseract 4.1.1 source code. Ultimately, I want to include a tesseract C++ project in my custom solution that consumes OCR results.

            When I follow these steps:

            1. Download and extract tesseract code https://github.com/tesseract-ocr/tesseract/archive/refs/tags/4.1.1.zip to "C:\tesseract" directory.
            2. Execute the following commands in a Developer Command Prompt for VS 2019:

            C:\Windows\System32>cd "C:\tesseract"
            C:\tesseract>mkdir build
            C:\tesseract>cd build
            C:\tesseract\build>cmake ..

            I receive this error:

            ...

            ANSWER

            Answered 2021-Jun-05 at 07:13

            There are several tutorial how to build tesseract on windows with cmake and VS e.g. https://bucket401.blogspot.com/2021/03/building-tesserocr-on-ms-windows-64bit.html (you can ignore end of tutorial - python module), minimalist tesseract or with clang

            Source https://stackoverflow.com/questions/67839925

            QUESTION

            unable to get text from the image
            Asked 2021-Apr-18 at 06:54

            I'm learning AI/ML and trying to get text from this sample form.

            ...

            ANSWER

            Answered 2021-Apr-18 at 06:54

            This link provides me the answer. Its removing the noise in the background image.

            Source https://stackoverflow.com/questions/67106600

            QUESTION

            Tesserocr installation on Mac M1
            Asked 2021-Mar-23 at 11:07

            I have tried a lot, but still, I don't know why unable to install Tesserocr and leptonica on Mac M1. the error is here, thanks for your help.

            ...

            ANSWER

            Answered 2021-Mar-23 at 11:07

            This looks like that the leptonica-related includes are not in the includes-path of gcc. Check their location and then extend $C_INCLUDE_PATH accordingly.

            Source https://stackoverflow.com/questions/66576661

            QUESTION

            OpenCV tesserocr watermark detection
            Asked 2020-Dec-02 at 10:08

            So I have about 12000 image links in my SQL table. Point is to detect which of those images contain watermarked text and which don't. All text and borders is like this.

            I've tried with OpenCV and tesserocr

            ...

            ANSWER

            Answered 2020-Dec-02 at 10:08

            tesserocr isn't detecting any text due to the small text height or small text size. By cropping the text region and using that image, pytesseract could extract the text. Using contour and dilation to detect text area didn't work either due to small text size. To detect the text region, I used EAST model to extract all regions using this solution and combined all the regions. Passing the extracted combined region image to tesseract returns the text. To run this script, You need to download the model which can be found here and install the required dependencies.
            Python Script:

            Source https://stackoverflow.com/questions/65088322

            QUESTION

            Pyinstaller - how to bundle imagemagick , Tesserocr
            Asked 2020-Aug-03 at 05:58

            I have made a Python tool(using PyQt) to work with scanned pdfs which uses tesserocr and imagemagick wand . Both Tessorocr and imagemagick executables I installed at my system and tool is working fine at my system. But now I want to make this tool as single executable to share with people. So that they do not need to install Imagemagick and Tesserocr separately.

            I have been searching this problem since days now, but could not get the concrete answer .

            Couple of hints I did try . create SPEC file with dependent binaries updating environment variable for imagemagick os.environ['MAGICK_HOME'] = './'

            But still not able to make single exe.

            Binaries path :

            ...

            ANSWER

            Answered 2020-Aug-03 at 05:58

            I have resolved this issue. To have Python tool running on local system I did following steps

            1. set os.environ['MAGICK_HOME'] = './'
            2. setting hidden import of tessor OCR python pkg dependencies
            3. upgrading setuptools to > 45.0.0.0 (pip install --upgrade setuptools)

            Source https://stackoverflow.com/questions/63060273

            QUESTION

            detect and split image for OCR
            Asked 2020-Mar-14 at 23:36

            I am trying to OCR standard forms (they are scanned both front and back)

            I only want to OCR The second image on the scan (the one with the textual information) - is there a way to detect and split them, and only process the right one? Sorry if I'm missing out on something essential, just starting off.

            ...

            ANSWER

            Answered 2020-Mar-14 at 23:36

            Here is working example for presented images:

            Source https://stackoverflow.com/questions/60631919

            QUESTION

            how to convert C++ tesseract-ocr code to Python?
            Asked 2020-Feb-12 at 07:06

            I want to convert the C++ version Result iterator example in tesseract-ocr doc to Python.

            ...

            ANSWER

            Answered 2020-Feb-11 at 15:21

            I think the problem is that api->Recognize() expects a pointer as first argument. They mistakenly put a 0 in their example but it should be nullptr. 0 and nullptr both have the same value but on 64bits systems they don't have the same size (usually ; I assume on some weird non-x86 systems this may not be true either).

            Their example still works with a C++ compiler because the compiler is aware that the function expects a pointer (64bits) and fix it silently.

            In your example, it seems you haven't specified the exact prototype of TessBaseAPIRecognize() to ctypes. So ctypes can't know a pointer (64 bits) is expected by this function. Instead it assumes that this function expects an integer (32 bits) --> it crashes.

            My suggestions:

            1. Use ctypes.c_void_p(None) instead of 0
            2. If you intend to use that in production, specify to ctypes all the function prototypes
            3. Be careful with the examples you look at: Those examples use Tesseract base API (C++ API) whereas if you want to use libtesseract with Python + ctypes, you have to use Tesseract C API. Those 2 APIs are very similar but may not be identical.

            If you need further help, you can have a look at how things are done in PyOCR. If you decide to use PyOCR in your project, just beware that the license of PyOCR is GPLv3+, which implies some restrictions.

            Source https://stackoverflow.com/questions/60166781

            QUESTION

            Running setup.py install for tesserocr ... error
            Asked 2020-Jan-03 at 12:25

            When ever I enter for command - pip3 install tesserocr -

            It gives error -

            Collecting tesserocr

            ...

            ANSWER

            Answered 2020-Jan-03 at 12:25

            For me later on these command goes for the error -

            Source https://stackoverflow.com/questions/59578278

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install tesserocr

            You can install using 'pip install tesserocr' or download it from GitHub, PyPI.
            You can use tesserocr like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install tesserocr

          • CLONE
          • HTTPS

            https://github.com/sirfz/tesserocr.git

          • CLI

            gh repo clone sirfz/tesserocr

          • sshUrl

            git@github.com:sirfz/tesserocr.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link