pdf2svg | A PDF to SVG converter written using PDF.js | Document Editor library
kandi X-RAY | pdf2svg Summary
kandi X-RAY | pdf2svg Summary
A PDF to SVG converter written using PDF.js.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of pdf2svg
pdf2svg Key Features
pdf2svg Examples and Code Snippets
Community Discussions
Trending Discussions on pdf2svg
QUESTION
I create an arrow in TikZ and convert it to svg (pdf2svg file.pdf file.svg
).
Then I import my output.svg
into FontForge and choose there Element --> Overlap --> remove
.
Finally I put the character \symbol{65}
into a tex-document, but I have still these 'overlaps' in form from white space inside the sign.
Could somebody help me to create the sign correctly?
...ANSWER
Answered 2020-Jun-26 at 07:49If the number of arrows is not too big, you can post-process the svg file with inkscape
select all and
ungroup
a couple of timesnow use
Stroke to Path
option
This will result in an svg file like this:
QUESTION
I'm building a web app to help students with learning Maths.
The app needs to display Maths content that comes from LaTex files. These Latex files render (beautifully) to pdf that I can convert cleanly to svg thanks to pdf2svg.
The (svg or png or whatever image format) image looks something like this:
...ANSWER
Answered 2017-Aug-19 at 16:35The image is top quality, perfectly clean, not skewed, well separated characters. A dream !
First perform binarization and blob detection (standard in OpenCV).
Then cluster the characters by grouping those with an overlap in the ordinates (i.e. facing each other in a row). This will naturally isolate the individual lines.
Now in every row, sort the blobs left-to-right and cluster by proximity to isolate the words. This will be a delicate step, because the spacing of characters within a word is close to the spacing between distinct words. Don't expect perfect results. This should work better than a projection.
The situation is worse with italics as the horizontal spacing is even narrower. You may have to also look at the "slanted distance", i.e. find the lines that tangent the characters in the direction of the italics. This can be achieved by applying a reverse shear transform.
Thanks to the grid, the graphs will appear as big blobs.
QUESTION
I'm trying to extract text information from a (digital) PDF by identifying content and location of each character and each word. For words, pdftotext --bbox
from xpdf / poppler works quite well, but I cannot find an easy way to extract character location.
What I've tried
The solution I currently have is to convert the pdf to svg (via pdf2svg
), and then parse the resulting svg to extract single character (= glyph) locations. In a third step, the resulting boxes are compared, each character is assigned to a word and hopefully the numbers match.
Problems
While the above works for most "basic" fonts, there are two (main) situations where this approach fails:
- In script fonts (or some extreme italic fonts), bounding boxes are way larger than their content; as a result, words overlap significantly, and it can well happen that a character is entirely contained in two words. In this case, the mapping fails, because once I translate to svg I have no information on what character is contained in which glyph.
- In many fonts multiple characters can be ligated, giving rise to a single glyph. In this case, the count of character boxes does not match the number of characters in the word, and matching each letter to a box is again problematic.
The second point (which is the main one for me) has a partial workaround by identifying the common ligatures and (if the counts don't match) splitting the corresponding bounding boxes into multiple pieces; but that cannot always work, because for example "ffi" is sometimes ligated to a single glyph, sometimes in two glyphs "ff" + "i", and sometimes in two glyphs "f" + "fi", depending on the font.
What I would hope
It is my understanding that pdf actually contain glyph information, and not words. If so, all the programs that extract text from pdf (like pdftotext
) must first extract and locate the various characters, and then maybe group them into words/lines; so I am a bit surprised that I could not find options to output location for each single character. Converting to svg essentially gives me that, but in that conversion all information about the content (i.e. the mapping glyph-to-character, or glyph-to-characters, if there was a ligature) is lost, because there is no font anymore. And redoing the effort of matching each glyph to a character by looking at the font again feels like rewriting a pdf parser...
I would therefore be very grateful for any idea of how to solve this. The top answer here suggests that this might be doable with TET, but it's a paying option, and replacing my whole infrastructure to handle just one limit case seems a big overkill...
...ANSWER
Answered 2019-May-17 at 13:29A PDF file doesn't necessarily specify the position of each character explicitly. Typically, it breaks a text into runs of characters (all using the same font, anything up to a line, I think) and then for each run, specifies the position of the bounding box that should contain the glyphs for those characters. So the exact position of each glyph will depend on metrics (mostly glyph-widths) of the font used to render it.
The Python package pdfminer
has a script pdf2txt.py
. Try invoking it with -t xml
. The docs just say XML format. Provides the most information.
But my notes indicate that it will apply the font-metrics and give you a element for every single glyph, with font and bounding-box info.
There are various versions in various places (e.g. PyPI and github). If you need Python 3 support, look for pdfminer.six
.
QUESTION
I am using pdf.js to extract text from the pdf but the font name appears as g_d0_f6
etc. I need the font name to use the appropriate table for converting to Unicode. Here is the code derived from pdf2svg.js sample:-
ANSWER
Answered 2019-Feb-24 at 15:33I think you were on the right track: page.commonObjs
is where the actual font name is found. However, page.commonObjs
only gets populated when the page's text/operators are accessed, so you'll find it empty if you look before that happens.
QUESTION
I will be ever so grateful if anyone can provide step by step help to install PGFPlots in windows for usage in Julia.
http://nbviewer.jupyter.org/github/sisl/PGFPlots.jl/blob/master/doc/PGFPlots.ipynb#PGFPlots
The website recommands:
Pdf2svg. This is required by TikzPictures. On Ubuntu, you can get this by running sudo apt-get install pdf2svg and on RHEL/Fedora by running sudo dnf install pdf2svg. On Windows, you can download the binaries from http://www.cityinthesky.co.uk/opensource/pdf2svg/. Be sure to add pdf2svg to your path (and restart).
Nevertheless, not a computer expert I do no understand:
To cross compile for Windows under Linux, simply install the relevant cross-compiler packages (for Fedora this is mingw32-cairo and mingw32-poppler and their dependencies) and then replace “./configure” in the compilation instructions above with “mingw32-configure”.
So if anyone can point me to the right direction that would be great?
...ANSWER
Answered 2018-May-23 at 10:57The second part you quote is about compiling Windows binaries for pdf2svg under Linux, which, I believe, is not what you are looking for.
Simply download this Github repository they referenced, which already includes compiled binaries for Windows. Extract either dist-32bits
or dist-64bits
whichever is suitable for your Windows installation (i.e. 32-bit or 64-bit) to somewhere in your computer . Then add the directory that contains pdf2svg.exe
to your PATH
system variable.
You will also need to install pgfplots
, which is a LaTeX package. So you need a LaTeX distribution, as well. You can simply install MiKTeX
and then install pgfplots
via its package manager/console.
That's it.
QUESTION
In our company we want to come up with universal solution for creating interactive presentations as mobile applications. First idea was to create a PDF file and use it in mobile phone, it didn't work out - it was too slow. Another idea was to convert PDF into SVGs and use them as scenes (slides) and that's what I am working on right now. What I forgot to mention, that PDF contains internal link annotations to navigate between pages.
So, for PDF to SVG conversion I use pdf2svg cli tool. I also wrote PHP cli application to parse all the links from PDF with their position. For conception I use ReactJS to test this idea on the WEB first (I have never worked with React Native before).
Now the problem: PDF contains a lot of high resolution images and a lot of pages, so some of SVG files are very large (up to 11MB) and size of all SVGs is ~70MB. When rendering these large SVG files, there is a delay (~1-10 seconds), comparing to PDF file that's not a huge win, so I need to optimise loading time.
What I have tried so far:
With earlier mentioned PHP CLI utility I wrote, I put some data about links inside SVG files (
). Then I rendered SVG by containing page number inside state with
and on each render created
onClick
event listeners fortags inside SVG for navigation. Well, it was first try and I wasn't satisfied with performance.
I tried to use
react-svg-loader
to inject SVGs as components. It didn't work out, performance was even worse (well, converting 70MB SVGs to JSX components doesn't sound good). By the way, I tried to build project for production, it tooks so long I just couldn't wait. So, not an option.Instead of SVG, I tried to use PNG images with smaller resolution (each PNG was about 800kb) and put links as div elements on top of an image, performance was really good, but I lost quality. So not an option.
Same as 3, but with SVG and
. I think it is slightly better, but still not a win.
Do you have any suggestions how could I improve performance by still using SVG? Should it perform better or worse in React Native?
...ANSWER
Answered 2017-Oct-06 at 16:33I think the main part of your problem is this: pdf2svg embeds all raster images as base64-encoded ASCII strings inside the SVG. Converting and rendering these seems to take significantly more time than loading and rendering an image that is referenced and stored in external PNG or JPEG files.
Unfortuantely I do not know a CLI tool that can, on importing a PDF, split out the embeded raster images into extra files. But the GUI SVG editor Inkscape can: Open a PDF file with Inkscape, and a dialog pops up that asks you not only about what page to select, but also showing an option "Embed all images". If you deselect this checkbox, the images will be stored as separate files in the directory the PDF is loaded from and only referenced in the form
QUESTION
I'm calling a bunch of command-line interface (CLI) tools (such as texi2pdf
or pdf2svg
from an R script, and I'd like to capture the output file of these tools directly as an R object, without touching the file system.
This is the opposite concern of the more frequent "how-do-I-redirect-stdout-to-file"-question. (Perhaps that implies that I'm "using it wrong").
Example:
Say, I have a simple latex reprex.tex
file that I'd like to compile:
ANSWER
Answered 2017-Jul-26 at 10:00Put simply: in the general case you can’t. But sometimes these tools allow you to specify an output file, in which case you can (on some systems, but note that this is not portable) specify /dev/stdout
as the output file.
According to the texi2pdf
manpage, the following should therefore work:
QUESTION
I am migrating an application (PDF2SVG
at http://github.com/contentmine/pdf2svg) from PDFBox-1.8.8
to PDFBox-2.0.6
. In the POM I have
ANSWER
Answered 2017-Jul-01 at 13:38There should be a shortcut in your IDE to fix these... in netbeans, it's CTRL-SHIFT-i. Anyway, here are the classes:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install pdf2svg
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page