Pdf2Dom | PDF parser that converts the documents to a HTML DOM

by radkovo Java Version: 2.0.3 License: LGPL-3.0

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | Pdf2Dom Summary

Pdf2Dom is a Java library typically used in Utilities applications. Pdf2Dom has no vulnerabilities, it has build file available, it has a Weak Copyleft License and it has low support. However Pdf2Dom has 2 bugs. You can download it from GitHub, Maven.

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. The inline CSS definitions contained in the resulting document are used for making the HTML page as similar as possible to the PDF input. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications. Pdf2Dom is based on the Apache PDFBox library. See the project page for more information and downloads:

Support

Quality

Security

License

Reuse

Support

Pdf2Dom has a low active ecosystem.

It has 127 star(s) with 64 fork(s). There are 22 watchers for this library.

It had no major release in the last 12 months.

There are 12 open issues and 16 have been closed. On average issues are closed in 186 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of Pdf2Dom is 2.0.3

Quality

Pdf2Dom has 2 bugs (0 blocker, 0 critical, 0 major, 2 minor) and 126 code smells.

Security

Pdf2Dom has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

Pdf2Dom code analysis shows 0 unresolved vulnerabilities.

There are 4 security hotspots that need review.

License

Pdf2Dom is licensed under the LGPL-3.0 License. This license is Weak Copyleft.

Weak Copyleft licenses have some restrictions, but you can use them in commercial projects.

Reuse

Pdf2Dom releases are not available. You will need to build from source code and install.

Deployable package is available in Maven.

Build file is available. You can build the component from source.

Pdf2Dom saves you 1312 person hours of effort in developing the same functionality from scratch.

It has 2945 lines of code, 261 functions and 25 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed Pdf2Dom and discovered the below as its top functions. This is intended to give you an instant insight into Pdf2Dom implemented functionality, and help decide if they suit your requirements.

Process a given operator
Rotate image
Create the transformation for the current page
Process an image operation
Renders a path
Creates a rectangle that represents a rectangle drawn at the page
Create a horizontal line element
Get the bounds of a path
Processes a page
Finish the entire box
Adds the entry for the specified font
Process a font resource
Process text position
Updates the text style based on a text position
Compares this style with another object
Updates the style for renderer
Prints out to HTML to HTML
Transforms a PDF document into a DOM tree
Parses the command line options
Creates a resource handler based on the given value
Generates a HTML resource
Base64 encodes a byte array
Process HTML resource
Return the next unused file name
Creates a hashCode of this class
Transform a length length

Get all kandi verified functions for this library.

Pdf2Dom Key Features

No Key Features are available at this moment for Pdf2Dom.

Pdf2Dom Examples and Code Snippets

No Code Snippets are available at this moment for Pdf2Dom.

Community Discussions

Trending Discussions on Pdf2Dom

How to change final HTML output using pdf2dom with Java?

QUESTION

How to change final HTML output using pdf2dom with Java?

Asked 2021-May-25 at 08:36

I want to convert a PDF document to an HTML file, and have my HTML output as close as possible as the original PDF. To do so, I am using Pdf2Dom. However, for business reasons I need to move the style div from the header, to the body section. The naive solution I tried is to get the text content of the style div, and to write it at the end of my document like so:

...

ANSWER

Answered 2021-May-25 at 08:36

I solved the issue by using Jsoup, which is an HTML parser. I first parse the PDF file, then convert it to an inputstream that I will pass to Jsoup parser and then apply my modifications there:

Source https://stackoverflow.com/questions/67267181

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install Pdf2Dom

You can download it from GitHub, Maven.
You can use Pdf2Dom like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the Pdf2Dom component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: