excalibur | A web interface to extract tabular data from PDFs | Document Editor library

by camelot-dev HTML Version: v0.4.3 License: MIT

X-Ray Key Features Code Snippets Community Discussions(9)Vulnerabilities Install Support

kandi X-RAY | excalibur Summary

excalibur is a HTML library typically used in Editor, Document Editor applications. excalibur has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot. Note: Excalibur only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".).

Support

Quality

Security

License

Reuse

Support

excalibur has a medium active ecosystem.

It has 1284 star(s) with 197 fork(s). There are 39 watchers for this library.

It had no major release in the last 12 months.

There are 79 open issues and 39 have been closed. On average issues are closed in 55 days. There are 24 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of excalibur is v0.4.3

Quality

excalibur has 0 bugs and 0 code smells.

Security

excalibur has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

excalibur code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

excalibur is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

excalibur releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 3172 lines of code, 65 functions and 62 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of excalibur

Get all kandi verified functions for this library.

excalibur Key Features

No Key Features are available at this moment for excalibur.

excalibur Examples and Code Snippets

No Code Snippets are available at this moment for excalibur.

Community Discussions

Trending Discussions on excalibur

Deserializing JSON to a Dictionary with Item being abstract

Bash script how read data from file or maybe is better way?

Extract fixed size and position table from pdf files in Python

Recommendation System by using Euclidean Distance (TypeError: unsupported operand type(s) for -: 'str' and 'str')

Should I use queue system to handle PDF text recognition in multitenant system?

Why does gcc's implementation of openMP fail to parallelise a recursive function inside another recursive function

Correct way of subscribing to Phoenix PubSub with Genserver

I did get object Property on UI

Website is not scrolling and cuts off text depending on aspect ratio

QUESTION

Deserializing JSON to a Dictionary with Item being abstract

Asked 2022-Feb-18 at 19:47

I'm looking to deserialize a JSON string to a Dictionary with Item being an abstract class. I serialize many types of items, some being Weapons, some being Armour, Consumables, etc.

Error: Newtonsoft.Json.JsonSerializationException: 'Could not create an instance of type Item. Type is an interface or abstract class and cannot be instantiated.

EDIT: I'm using Newtonsoft.Json for serializing / deserializing

Deserialization code:

...

ANSWER

Answered 2022-Feb-18 at 19:47

You can use custom converter to be able to deserialize to different types in same hierarchy. Also I highly recommend using properties instead of fields. So small reproducer can look like this:

Source https://stackoverflow.com/questions/71178309

QUESTION

Bash script how read data from file or maybe is better way?

Asked 2021-May-06 at 12:53

I totally I don't have knowledge to create scripts but with internet help I made something like this:

...

ANSWER

Answered 2021-May-05 at 14:54

After reading your post twice, I think I understood you. The napiprojekt:id can be extracted with grep -o 'napiprojekt:.*'; the output of this can be inserted in the napi.sh download command line with backticks.

variant with log file:

Source https://stackoverflow.com/questions/67400012

QUESTION

Extract fixed size and position table from pdf files in Python

Asked 2021-Apr-13 at 12:38

Say I have many similar pdf files as the one from here:

I woudld like to extract the following table and save as excel file:

I'm able to do extract table and save excel file manually with package excalibur.

After installing Excalibur with pip3, I initialize the metadata database using:

$ excalibur initdb

And then start the webserver using:

$ excalibur webserver

Then go to http://localhost:5000 and start extracting tabular data from PDFs.

I wonder if it's possible to automatically do that with python script for multiple pdf files with packages such as excalibur-py, camelot, pdfminer, etc, since the size and position of table are fixed for same city's reports.

You may download other report files from this link.

Many thanks at advance.

...

ANSWER

Answered 2021-Apr-13 at 12:38

Using Camelot, you can build a pipeline like this:

Source https://stackoverflow.com/questions/67068198

QUESTION

Recommendation System by using Euclidean Distance (TypeError: unsupported operand type(s) for -: 'str' and 'str')

Asked 2021-Jan-03 at 19:48

I have a problem about implementing recommendation system by using Euclidean Distance.

What I want to do is to list some close games with respect to search criteria by game title and genre.

Here is my project link : Link

After calling function, it throws an error shown below. How can I fix it?

Here is the error

...

ANSWER

Answered 2021-Jan-03 at 16:00

The issue is that you are using euclidean distance for comparing strings. Consider using Levenshtein distance, or something similar, which is designed for strings. NLTK has a function called edit distance that can do this or you can implement it on your own.

Source https://stackoverflow.com/questions/65551325

QUESTION

Should I use queue system to handle PDF text recognition in multitenant system?

Asked 2020-Oct-19 at 05:20

I am building a system to allow our clients to transform PDF bank statements (from many different banks) to its better CSV form (better because it can be imported into accounting application). It will find tables on PDFs pages and convert them into CSV files.

I am going to use:

Simple static webpage with HTML form to upload PDFs and choose which bank to process. It will also display job status and allow to download result of the transformation (CSV files). It should operate without user authentication.
Backend running on NodeJS (more on that later)
Excalibur
Puppeteer (to operate Excalibur)

The Backend has to take responsibility for:

Receiving request from the UI (PDF payload)
Generate new job id
1. sending it back to UI
2. provide HTTP resource for UI to ask for job status
Make new instance of Puppeteer, pass to it received PDF and job id
Wait for Puppeteer to finish, receive archive file (Excalibur puts every page of the table in a separate CSV file)
Unpack archived CSV files
Normalize it with transformers (written with https://www.npmjs.com/package/mississippi)
Send response to UI (client)

Problems that will occur:

Multi-tenancy - multiple users at once will access the system (I am used to PHP which runs in context of a one user session, and I know that NodeJS resides in memory, going to resolve it with 'continuation-local-storage' package)
Communication FE<->BE, there is a challenge with processing of big PDF files (it will take a lot of time) and giving feedback to user. That's why I need some sort of job id to recognize clients.
Disabling Excalibur database - my solution does not need to save any state.

As You can see there is quite a lot of things to do. I do not want to discuss decisions (eg why Puppeteer and not direct access to Excalibur API). This is rather the first, crude version. I have plenty of ideas to improve this system later.

My question is: Should I use message queue system or not to simplify (make it more readable) this system? How could this system benefit from using such queue like AMQP or Azure Queues or simply MongoDB as a queue? How a simple design (block diagram) of such system could look like when using message queue? I have no previous experience with message queues, I never used them, but I feel message queue could help me design better structure of this system.

...

ANSWER

Answered 2020-Oct-18 at 16:59

In general, queuing is not used to simplify a system. The simplest approach is to do the translation when the message is received and immediately respond with the result. The primary function of a queue is to add a layer of isolation between the data consumer and the data producer which supports a dynamic ordered backlog of messages to work on. Using a queue can be useful in situations where:

Incoming messages do not need to be processed real-time.
Message production rates may temporarily exceed consumption rates.
Message consumers do not depend on message producers.
Processing order of messages is important.

Given translating PDF files to csv is a relatively expensive operation and it doesn't need to complete immediately, writing incoming requests to a queue and responding with a job ID is a reasonable approach.

Source https://stackoverflow.com/questions/64392557

QUESTION

Why does gcc's implementation of openMP fail to parallelise a recursive function inside another recursive function

Asked 2020-May-06 at 22:17

I am trying to parallelise these recursive functions with openMP tasks,

when I compile with gcc it runs only on 1 thread. When i compile it with clang it runs on multiple threads

The second function calls the first one which doesn't generate new tasks to stop wasting time.

gcc does work when there is only one function that calls itself.

Why is this?

Am I doing something wrong in the code?

Then why does it work with clang?

I am using gcc 9.3 on windows with Msys2. The code was compiled with -O3 -fopenmp

...

ANSWER

Answered 2020-May-06 at 22:17

Your program doesn't run in parallel because there is simply nothing to run in parallel. Upon first entry in mario, current_node is 9 and vec is all 8s, so this loop in the first and only task never executes:

Source https://stackoverflow.com/questions/61620390

QUESTION

Correct way of subscribing to Phoenix PubSub with Genserver

Asked 2020-May-06 at 04:32

Ive been trying to instantiate a genserver process that will subscribe to PubSub in Phoenix framework, these are my files and errors:

config.ex:

...

ANSWER

Answered 2020-May-06 at 04:32

As by documentation, Phoenix.PubSub.subscribe/3 has the following spec:

Source https://stackoverflow.com/questions/61624376

QUESTION

I did get object Property on UI

Asked 2020-May-03 at 20:46

This is my component file, when I show event object property it is not showing any on UI. somebody get their time to resolve the issue as I am a newbie into angular and project in the production.

...

ANSWER

Answered 2020-May-03 at 20:36

In your EVENTS.find you should return true when It matches. You have 2 sentences there, so, the inline condition is not returning any value

Source https://stackoverflow.com/questions/61581170

QUESTION

Website is not scrolling and cuts off text depending on aspect ratio

Asked 2020-Apr-10 at 15:58

I'm having issues where my website will cut off all information at the bottom of the screen and not scroll when at smaller aspect ratios. It will show the image no matter what but will cut the text below it. I have tried using overflow-y and overflow but neither allow scrolling. I'm not sure if it is due to elements being fixed or not but having the elements fixed is the only way I've been able to get them to look right.

Here is the HTML:

...

ANSWER

Answered 2020-Apr-08 at 14:46

doesn't need to be position:fixed; if you want it to scroll with the viewport. Try position:relative;.

Add a z-index: 10 to your

and elements so when you scroll the text is below the header.

Source https://stackoverflow.com/questions/61091724

Community Discussions, Code Snippets contain sources that include Stack Exchange Network