freki | Analyze XML extracted from PDFs | Document Editor library
kandi X-RAY | freki Summary
kandi X-RAY | freki Summary
Freki is a package that takes the markup-language formatted output of a PDF-to-text extraction tool (either PDFLib TET or PDFMiner), and detects text blocks (e.g., paragraphs, headers, figures, etc.). The blocks are assigned attributes (identifiers, bounding boxes, etc.) for later analysis. This was developed for the detection of interlinear glossed text (IGT), but it could serve other purposes, as well. Freki also includes a method to convert plain text documents to the Freki format for the purposes of IGT detection and language identification.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Run analyzer
- Processes a freki document
- Return the document id given a path
- Calculate the average character width of a block
- Analyze the content of the page
- Convert a zone to a Block
- Make a bitmap
- Computes the parameters for each bitmap
- Reads the XML page
- Merge two ranges
- Append a new item
- Return dict of spans
- Iterate over all lines
- List of fonts
- Read a file from a string
- List of bounding boxes
- List of fonts in the block
- List of font fonts
- List of lines
freki Key Features
freki Examples and Code Snippets
Community Discussions
Trending Discussions on freki
QUESTION
Below is the code I have now. It pulls the Job-Base-Cost just fine, however I cannot get it to pull the ID and or Name of the item. Can you help?
Link to the sites XML pull.
...ANSWER
Answered 2018-Dec-29 at 13:47This is a sample of one line of the OP's XML file
109555912.69
The OP wants to use the IMPORTXML
function to report the ID and Name as well as the Job Cost from the XML data. Presently, the OP's formula is:
=importxml("link","//job-base-cost")
There are two options:
1 - One long column
=importxml("link","//@id | //@name | //job-base-cost")
Note //@id
and //@name
in the xpath query: //
indicate nodes in the document (at any level, not just the root level) and @
indicate attributes. The pipe |
operator indicates AND. So the plain english query is to display the id, name and job-base-cost.
2 - Three columns (table format)
={IMPORTXML("link","//@name"),IMPORTXML("link","//job-base-cost"),IMPORTXML("link","//@id")}
This creates a series that will display the fields in each of three columns.
Note: there is an arrayformula that uses a single importXML function described in How do I return multiple columns of data using ImportXML in Google Spreadsheets?. Readers may want to look at whether that option can be implemented.
My thanks to @Tanaike for his comment which spurred me to look at how xpath works.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install freki
You can use freki like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page