pdf2svg | simple PDF to SVG converter | Document Editor library

by dawbarton Shell Version: v0.2.3 License: GPL-2.0

X-Ray Key Features Code Snippets Community Discussions(8)Vulnerabilities Install Support

kandi X-RAY | pdf2svg Summary

pdf2svg is a Shell library typically used in Editor, Document Editor applications. pdf2svg has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. You can download it from GitHub.

A simple PDF to SVG converter using the Poppler and Cairo libraries. For Windows binaries see For Linux binaries, see your package manager (e.g., "yum install pdf2svg" or "apt-get install pdf2svg").

Support

Quality

Security

License

Reuse

Support

pdf2svg has a low active ecosystem.

It has 483 star(s) with 78 fork(s). There are 20 watchers for this library.

It had no major release in the last 12 months.

There are 17 open issues and 14 have been closed. On average issues are closed in 101 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of pdf2svg is v0.2.3

Quality

pdf2svg has no bugs reported.

Security

pdf2svg has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

pdf2svg is licensed under the GPL-2.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

pdf2svg releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of pdf2svg

Get all kandi verified functions for this library.

pdf2svg Key Features

No Key Features are available at this moment for pdf2svg.

pdf2svg Examples and Code Snippets

No Code Snippets are available at this moment for pdf2svg.

Community Discussions

Trending Discussions on pdf2svg

Fontforge: How to "remove overlaps" making work

Detect words and graphs in image and slice image into 1 image per word or graph

Parse PDF file and output single character locations

Extract Font Name from PDF

Setting PGFPlots in Julia and Windows

Rendering heavy SVG files (improving performance)

How can I capture CLI tool file output to R object or stdout?

Missing classes in PDFBox2.0

QUESTION

Fontforge: How to "remove overlaps" making work

Asked 2020-Jun-26 at 07:49

I create an arrow in TikZ and convert it to svg (pdf2svg file.pdf file.svg).
Then I import my output.svg into FontForge and choose there Element --> Overlap --> remove.
Finally I put the character \symbol{65} into a tex-document, but I have still these 'overlaps' in form from white space inside the sign.

Could somebody help me to create the sign correctly?

...

ANSWER

Answered 2020-Jun-26 at 07:49

If the number of arrows is not too big, you can post-process the svg file with inkscape

select all and ungroup a couple of times
now use Stroke to Path option
and finally union all the shapes

This will result in an svg file like this:

Source https://stackoverflow.com/questions/62586681

QUESTION

Detect words and graphs in image and slice image into 1 image per word or graph

Asked 2020-Jan-05 at 15:28

I'm building a web app to help students with learning Maths.

The app needs to display Maths content that comes from LaTex files. These Latex files render (beautifully) to pdf that I can convert cleanly to svg thanks to pdf2svg.

The (svg or png or whatever image format) image looks something like this:

...

ANSWER

Answered 2017-Aug-19 at 16:35

The image is top quality, perfectly clean, not skewed, well separated characters. A dream !

First perform binarization and blob detection (standard in OpenCV).

Then cluster the characters by grouping those with an overlap in the ordinates (i.e. facing each other in a row). This will naturally isolate the individual lines.

Now in every row, sort the blobs left-to-right and cluster by proximity to isolate the words. This will be a delicate step, because the spacing of characters within a word is close to the spacing between distinct words. Don't expect perfect results. This should work better than a projection.

The situation is worse with italics as the horizontal spacing is even narrower. You may have to also look at the "slanted distance", i.e. find the lines that tangent the characters in the direction of the italics. This can be achieved by applying a reverse shear transform.

Thanks to the grid, the graphs will appear as big blobs.

Source https://stackoverflow.com/questions/45772483

QUESTION

Parse PDF file and output single character locations

Asked 2019-May-17 at 13:29

I'm trying to extract text information from a (digital) PDF by identifying content and location of each character and each word. For words, pdftotext --bbox from xpdf / poppler works quite well, but I cannot find an easy way to extract character location.

What I've tried

The solution I currently have is to convert the pdf to svg (via pdf2svg), and then parse the resulting svg to extract single character (= glyph) locations. In a third step, the resulting boxes are compared, each character is assigned to a word and hopefully the numbers match.

Problems

While the above works for most "basic" fonts, there are two (main) situations where this approach fails:

In script fonts (or some extreme italic fonts), bounding boxes are way larger than their content; as a result, words overlap significantly, and it can well happen that a character is entirely contained in two words. In this case, the mapping fails, because once I translate to svg I have no information on what character is contained in which glyph.
In many fonts multiple characters can be ligated, giving rise to a single glyph. In this case, the count of character boxes does not match the number of characters in the word, and matching each letter to a box is again problematic.

The second point (which is the main one for me) has a partial workaround by identifying the common ligatures and (if the counts don't match) splitting the corresponding bounding boxes into multiple pieces; but that cannot always work, because for example "ffi" is sometimes ligated to a single glyph, sometimes in two glyphs "ff" + "i", and sometimes in two glyphs "f" + "fi", depending on the font.

What I would hope

It is my understanding that pdf actually contain glyph information, and not words. If so, all the programs that extract text from pdf (like pdftotext) must first extract and locate the various characters, and then maybe group them into words/lines; so I am a bit surprised that I could not find options to output location for each single character. Converting to svg essentially gives me that, but in that conversion all information about the content (i.e. the mapping glyph-to-character, or glyph-to-characters, if there was a ligature) is lost, because there is no font anymore. And redoing the effort of matching each glyph to a character by looking at the font again feels like rewriting a pdf parser...

I would therefore be very grateful for any idea of how to solve this. The top answer here suggests that this might be doable with TET, but it's a paying option, and replacing my whole infrastructure to handle just one limit case seems a big overkill...

...

ANSWER

Answered 2019-May-17 at 13:29

A PDF file doesn't necessarily specify the position of each character explicitly. Typically, it breaks a text into runs of characters (all using the same font, anything up to a line, I think) and then for each run, specifies the position of the bounding box that should contain the glyphs for those characters. So the exact position of each glyph will depend on metrics (mostly glyph-widths) of the font used to render it.

The Python package pdfminer has a script pdf2txt.py. Try invoking it with -t xml. The docs just say XML format. Provides the most information. But my notes indicate that it will apply the font-metrics and give you a element for every single glyph, with font and bounding-box info.

There are various versions in various places (e.g. PyPI and github). If you need Python 3 support, look for pdfminer.six.

Source https://stackoverflow.com/questions/56172546

QUESTION

Extract Font Name from PDF

Asked 2019-Feb-25 at 02:16

I am using pdf.js to extract text from the pdf but the font name appears as g_d0_f6 etc. I need the font name to use the appropriate table for converting to Unicode. Here is the code derived from pdf2svg.js sample:-

...

ANSWER

Answered 2019-Feb-24 at 15:33

I think you were on the right track: page.commonObjs is where the actual font name is found. However, page.commonObjs only gets populated when the page's text/operators are accessed, so you'll find it empty if you look before that happens.

Source https://stackoverflow.com/questions/54630691

QUESTION

Setting PGFPlots in Julia and Windows

Asked 2018-May-24 at 06:11

I will be ever so grateful if anyone can provide step by step help to install PGFPlots in windows for usage in Julia.

http://nbviewer.jupyter.org/github/sisl/PGFPlots.jl/blob/master/doc/PGFPlots.ipynb#PGFPlots

The website recommands:

Pdf2svg. This is required by TikzPictures. On Ubuntu, you can get this by running sudo apt-get install pdf2svg and on RHEL/Fedora by running sudo dnf install pdf2svg. On Windows, you can download the binaries from http://www.cityinthesky.co.uk/opensource/pdf2svg/. Be sure to add pdf2svg to your path (and restart).

Nevertheless, not a computer expert I do no understand:

To cross compile for Windows under Linux, simply install the relevant cross-compiler packages (for Fedora this is mingw32-cairo and mingw32-poppler and their dependencies) and then replace “./configure” in the compilation instructions above with “mingw32-configure”.

So if anyone can point me to the right direction that would be great?

...

ANSWER

Answered 2018-May-23 at 10:57

The second part you quote is about compiling Windows binaries for pdf2svg under Linux, which, I believe, is not what you are looking for.

Simply download this Github repository they referenced, which already includes compiled binaries for Windows. Extract either dist-32bits or dist-64bits whichever is suitable for your Windows installation (i.e. 32-bit or 64-bit) to somewhere in your computer . Then add the directory that contains pdf2svg.exe to your PATH system variable.

You will also need to install pgfplots, which is a LaTeX package. So you need a LaTeX distribution, as well. You can simply install MiKTeX and then install pgfplots via its package manager/console.

That's it.

Source https://stackoverflow.com/questions/50477281

QUESTION

Rendering heavy SVG files (improving performance)

Asked 2017-Oct-06 at 16:33

In our company we want to come up with universal solution for creating interactive presentations as mobile applications. First idea was to create a PDF file and use it in mobile phone, it didn't work out - it was too slow. Another idea was to convert PDF into SVGs and use them as scenes (slides) and that's what I am working on right now. What I forgot to mention, that PDF contains internal link annotations to navigate between pages.

So, for PDF to SVG conversion I use pdf2svg cli tool. I also wrote PHP cli application to parse all the links from PDF with their position. For conception I use ReactJS to test this idea on the WEB first (I have never worked with React Native before).

Now the problem: PDF contains a lot of high resolution images and a lot of pages, so some of SVG files are very large (up to 11MB) and size of all SVGs is ~70MB. When rendering these large SVG files, there is a delay (~1-10 seconds), comparing to PDF file that's not a huge win, so I need to optimise loading time.

What I have tried so far:

With earlier mentioned PHP CLI utility I wrote, I put some data about links inside SVG files (). Then I rendered SVG by containing page number inside state with and on each render created onClick event listeners for tags inside SVG for navigation. Well, it was first try and I wasn't satisfied with performance.
I tried to use react-svg-loader to inject SVGs as components. It didn't work out, performance was even worse (well, converting 70MB SVGs to JSX components doesn't sound good). By the way, I tried to build project for production, it tooks so long I just couldn't wait. So, not an option.
Instead of SVG, I tried to use PNG images with smaller resolution (each PNG was about 800kb) and put links as div elements on top of an image, performance was really good, but I lost quality. So not an option.
Same as 3, but with SVG and . I think it is slightly better, but still not a win.

Do you have any suggestions how could I improve performance by still using SVG? Should it perform better or worse in React Native?

...

ANSWER

Answered 2017-Oct-06 at 16:33

I think the main part of your problem is this: pdf2svg embeds all raster images as base64-encoded ASCII strings inside the SVG. Converting and rendering these seems to take significantly more time than loading and rendering an image that is referenced and stored in external PNG or JPEG files.

Unfortuantely I do not know a CLI tool that can, on importing a PDF, split out the embeded raster images into extra files. But the GUI SVG editor Inkscape can: Open a PDF file with Inkscape, and a dialog pops up that asks you not only about what page to select, but also showing an option "Embed all images". If you deselect this checkbox, the images will be stored as separate files in the directory the PDF is loaded from and only referenced in the form

Source https://stackoverflow.com/questions/46602558

QUESTION

How can I capture CLI tool file output to R object or stdout?

Asked 2017-Jul-26 at 10:00

I'm calling a bunch of command-line interface (CLI) tools (such as texi2pdf or pdf2svg from an R script, and I'd like to capture the output file of these tools directly as an R object, without touching the file system.

This is the opposite concern of the more frequent "how-do-I-redirect-stdout-to-file"-question. (Perhaps that implies that I'm "using it wrong").

Example:

Say, I have a simple latex reprex.tex file that I'd like to compile:

...

ANSWER

Answered 2017-Jul-26 at 10:00

Put simply: in the general case you can’t. But sometimes these tools allow you to specify an output file, in which case you can (on some systems, but note that this is not portable) specify /dev/stdout as the output file.

According to the texi2pdf manpage, the following should therefore work:

Source https://stackoverflow.com/questions/45322657

QUESTION

Missing classes in PDFBox2.0

Asked 2017-Jul-01 at 13:38

I am migrating an application (PDF2SVG at http://github.com/contentmine/pdf2svg) from PDFBox-1.8.8 to PDFBox-2.0.6. In the POM I have

...

ANSWER

Answered 2017-Jul-01 at 13:38

There should be a shortcut in your IDE to fix these... in netbeans, it's CTRL-SHIFT-i. Anyway, here are the classes:

Source https://stackoverflow.com/questions/44859593

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install pdf2svg

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: