textricator | extract text from documents and generate structured data | Regex library

 by   measuresforjustice Kotlin Version: 10.1.70 License: AGPL-3.0

kandi X-RAY | textricator Summary

kandi X-RAY | textricator Summary

textricator is a Kotlin library typically used in Utilities, Regex applications. textricator has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. You can download it from GitHub.

Textricator is a tool to extract text from documents and generate structured data.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              textricator has a low active ecosystem.
              It has 279 star(s) with 33 fork(s). There are 28 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 8 open issues and 19 have been closed. On average issues are closed in 103 days. There are 2 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of textricator is 10.1.70

            kandi-Quality Quality

              textricator has 0 bugs and 32 code smells.

            kandi-Security Security

              textricator has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              textricator code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              textricator is licensed under the AGPL-3.0 License. This license is Strong Copyleft.
              Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

            kandi-Reuse Reuse

              textricator releases are not available. You will need to build from source code and install.
              Installation instructions, examples and code snippets are available.
              It has 3935 lines of code, 262 functions and 63 files.
              It has low code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of textricator
            Get all kandi verified functions for this library.

            textricator Key Features

            No Key Features are available at this moment for textricator.

            textricator Examples and Code Snippets

            No Code Snippets are available at this moment for textricator.

            Community Discussions

            QUESTION

            How to set the FSM configuaration for Textricator PDF OCR reader?
            Asked 2021-May-17 at 22:43

            I'm trying to use the PDF document parser called Textricator. It can use 3 different methods for parsing a PDF with some common OCR libraries. (itext5, itext7, pdfbox) The available methods are: text, table and form. Text for normal raw OCR recognition, table to read out structured table data, and form for parsing less structured forms, using a Finite State Machine (FSM).

            However, I am not able to use the form parser. Perhaps I simply don't understand how to organize the many configuration states. The documentation is lacking a simple form example, and someone recently posted an attempt to read a very basic table using the form method, but was not able to. I also gave it a shot, but without any success.

            Q: Can someone help me configure the state machine in the YML file?
            (This is used to parse the demo file from one of that repo's issues, and shown in the copied screenshot below.)

            The YML configuration file.

            ...

            ANSWER

            Answered 2021-May-17 at 18:42

            As Textricator is kind of a hidden gem for pdf parsing imo, I'm happy to see someone using it and posted a config working with the sample document to the github issue:

            Source https://stackoverflow.com/questions/67258726

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install textricator

            Install Java (version 8+) Windows & Macos: Download from https://java.com and install. Linux: Use your package manager.
            Download the latest build of Textricator from https://repo1.maven.org/maven2/io/mfj/textricator/ - click on the directory for the latest version and download textricator-VERSION-bin.tgz (or textricator-VERSION-bin.zip for Windows).
            Extract it.
            Run a shell Windows: run Windows Powershell (it should be in the start menu) The following examples start with ./textricator. On Windows, use .\textricator.bat. MacOS: Run Terminal (type "terminal" in Spotlight)
            Show help ./textricator --help
            Download the example files to the textricator directory: https://github.com/measuresforjustice/textricator/blob/main/src/test/resources/io/mfj/textricator/examples/school-employee-list.pdf https://github.com/measuresforjustice/textricator/blob/main/src/test/resources/io/mfj/textricator/examples/school-employee-list.yml
            Extract raw text from a PDF to standard out ./textricator text --input-format=pdf.pdfbox school-employee-list.pdf
            Parse a PDF to CSV ./textricator form --config=school-employee-list.yml school-employee-list.pdf school-employee-list.csv This uses the configuration file school-employee-list.yml to parse school-employee-list.pdf. To parse your own PDF form, you will need to write your own configuration file. See the Form section for details. If your PDF has a tabular layout, see the Table section.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
            Maven
            Gradle
            CLONE
          • HTTPS

            https://github.com/measuresforjustice/textricator.git

          • CLI

            gh repo clone measuresforjustice/textricator

          • sshUrl

            git@github.com:measuresforjustice/textricator.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Regex Libraries

            z

            by rupa

            JSVerbalExpressions

            by VerbalExpressions

            regexr

            by gskinner

            path-to-regexp

            by pillarjs

            Try Top Libraries by measuresforjustice

            textricator-gui

            by measuresforjusticeKotlin

            expr

            by measuresforjusticeKotlin

            UCR-Project

            by measuresforjusticeHTML