xpdf | xpdf with local changes | Document Editor library

by tmyroadctfig C++ Version: Current License: GPL-2.0

X-Ray Key Features Code Snippets Community Discussions(9)Vulnerabilities Install Support

kandi X-RAY | xpdf Summary

xpdf is a C++ library typically used in Editor, Document Editor applications. xpdf has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. You can download it from GitHub.

Xpdf is an open source viewer for Portable Document Format (PDF) files. (These are also sometimes also called Acrobat files, from the name of Adobe’s PDF software.) The Xpdf project also includes a PDF text extractor, PDF-to-PostScript converter, and various other utilities. Xpdf runs under the X Window System on UNIX, VMS, and OS/2. The non-X components (pdftops, pdftotext, etc.) also run on Windows and Mac OSX systems and should run on pretty much any system with a decent C++ compiler. Xpdf will run on 32-bit and 64-bit machines.

Support

Quality

Security

License

Reuse

Support

xpdf has a low active ecosystem.

It has 18 star(s) with 19 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

xpdf has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of xpdf is current.

Quality

xpdf has 0 bugs and 0 code smells.

Security

xpdf has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

xpdf code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

xpdf is licensed under the GPL-2.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

xpdf releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of xpdf

Get all kandi verified functions for this library.

xpdf Key Features

No Key Features are available at this moment for xpdf.

xpdf Examples and Code Snippets

No Code Snippets are available at this moment for xpdf.

Community Discussions

Trending Discussions on xpdf

ModuleNotFoundError: No module named 'milvus'

Pandoc conversion to pdf not working on heroku

shrinking a Docker Image - from debian to scratch - how to migrate?

How to evaluate results of pipes in a bash script

How to list PDF page sizes in the command line

How extract text from this compressed PDF/A?

Powershell won't output "£" in email html body

PDFsharp - overlay page from other PDF

Unable to import pdftotext after installing with conda and poppler, Windows 10

QUESTION

ModuleNotFoundError: No module named 'milvus'

Asked 2022-Feb-15 at 19:23

Goal: to run this Auto Labelling Notebook on AWS SageMaker Jupyter Labs.

Kernels tried: conda_pytorch_p36, conda_python3, conda_amazonei_mxnet_p27.

...

ANSWER

Answered 2022-Feb-03 at 09:29

I would recommend to downgrade your milvus version to a version before the 2.0 release just a week ago. Here is a discussion on that topic: https://github.com/deepset-ai/haystack/issues/2081

Source https://stackoverflow.com/questions/70954157

QUESTION

Pandoc conversion to pdf not working on heroku

Asked 2021-Jan-25 at 19:29

I have a ruby on rails app that uses pandoc-ruby to convert markdown files into pdf. The pandoc-ruby requires pandoc installation. To successfully convert to pdf, pdflatex needs to be present as well. Locally (tested on Mac and Ubuntu 18.04) everything is working if pandoc, texlive-latex-recommended and texlive-fonts-recommended packages are installed. Things get a little bit tricky when deploying to heroku. To install all the packages on heroku I've used the Aptfile approach and I have not been able to solve this.

Approach 1: Aptfile

I've specified this Aptfile:

...

ANSWER

Answered 2021-Jan-25 at 19:29

After quite a bit of trial and error, I have found a solution that works.

As @mb21 mentioned, Docker image would probably be the best option long term. Docker images are supported on Heroku. However, I wanted to avoid dockerizing the whole application to solve this issue.

After finding a TeX Live buildpack for Heroku that supports adding custom TeX Live packages (one example of such buildpack), the error on conversion was ! LaTeX Error: File 'xcolor.sty' not found.

I used tlmgr to get some info on the missing file. Running tlmgr search --global --file xcolor.sty does the trick and reveals that there is a package called xcolor. After installing that we come to the next error, and the next, and the next. In the end I ended up installing 2 collections that are small enough for Heroku (mind the 500MB slug size limit) and contain everything pandoc needs for a successful conversion. Those 2 are collection-fontsrecommended and collection-latexrecommended.

Adding a texlive.packages file to the root of the application does the trick. It is recognized by the buildpack and it installs all the specified packages for you using tlmgr.

Source https://stackoverflow.com/questions/65853208

QUESTION

shrinking a Docker Image - from debian to scratch - how to migrate?

Asked 2021-Jan-21 at 06:56

i am trying to build a minimalistic docker image for one of my applicatoins

in my "usual" builds i do not rely on 3rd party applications. This time I need to include a precompiled executeable (xpdf) to the build; My go applications are prebuilt in a builder Docker and then copied over (no dependencies).

my current Dockerimage file looks like this: (working!) application launches

...

ANSWER

Answered 2021-Jan-19 at 09:19

Solution Step 1 - create libsource

create a docker image where you can grab all required libraries from

Source https://stackoverflow.com/questions/65777031

QUESTION

How to evaluate results of pipes in a bash script

Asked 2021-Jan-02 at 12:47

I need help for the following problem: I'd like to kill all instances of a program, let's say xpdf. At the prompt the following works as intended:

...

ANSWER

Answered 2021-Jan-02 at 12:47

This answer is for the case where killall or pkill (suggested in this answer) are not enough for you. For example if you really want to print "xpdf läuft nicht" if there is no pid to kill or applying kill -SIGTERM because you want to be sure of the signal you send to your pids or whatever.

You could use a bash loop instead of xargs and sed. It's pretty simple to iterate over CSV/column outputs:

Source https://stackoverflow.com/questions/65507178

QUESTION

How to list PDF page sizes in the command line

Asked 2020-Jun-23 at 16:08

In the Ghostscript documentation I did not found arguments to query the paper sizes of a PDF document.

I read about a pdf_info.ps file in the lib subdirectory.

I tried this code:

...

ANSWER

Answered 2020-Jun-23 at 16:08

Recent versions of Ghostscript default to SAFER mode by default, which prevents PostScript programs (like pdf_info.ps) from accessing files in the file system.

In general Ghostscript will try and infer from the command line when files should be permitted (such as the input filename, in the case above pdf_info.ps) but it can't know that -sFile= should be permitted, because that part of the command simply ends up in the PostScript interpreter.

So to use pdf_info.ps you will either have to set -dNOSAFER or add --permit-file-read= to your command line. -dNOSAFER turns off all protection so you may not want to do that, --permit-file-read allows the PostScript program to read the specified directory only. I'd recommend you do that.

I'd also suggest you experiment from the command line using the usual Ghostscript executable and only move to your application when you have it correct.

If you are planning to distribute this application, please have a look at the license file.

Source https://stackoverflow.com/questions/62524446

QUESTION

How extract text from this compressed PDF/A?

Asked 2020-May-22 at 15:57

For machine learning purposes (sckit-learn), I need to extract the raw text from lots of PDF files. First off, I was using xpdf pdftotext to do this task:

...

ANSWER

Answered 2020-May-18 at 17:50

There are two fairly simple techniques you can use.

1) Google's "Tessaract" open source OCR (optical character recognition). You could apply this evenly to all PDFs, though converting all that data into pixels and then working magic upon them is going to be more computationally expensive. Which is more important, engineer time or CPU time? There's a pytesseract module. Note that this tool works on image formats, so you'd have to use something like GhostScript (another open source project) to convert all of a PDF's pages to images, then run [py]tessaract on those images.

2) pyPDF can get each page and programmatically extract any text draw operations in the order they were drawn onto the page. This may be nothing like the logical reading order of the page... While a PDF could draw all the 'a's and then all the 'b's (and so forth), it's actually more efficient to draw everything in "font a" , then everything in "font b". It's important to note that "font b" might just be the italic version of "font a". This produces a shorter/more efficient stream of drawing commands, though probably not by such an amount as to be a good business decision to do so.

The kicker here is that a random pile of PDF files might require you to do some OCR. A poorly assembled PDF (one with a font subset that has no "to unicode" data) can't be properly mined for text even though it has nothing but text drawing operations. "Draw glyphs one through five from "font C" doesn't mean much if you don't know that those first five glyphs are "g-l-y-p-h", because that's the order they were used in.

On the other hand, if you've got home-grown PDFs or all your pdfs are from some known source (Word's pdf converter for example), you'll know what to expect in advance.

Note that the only thing mentioned above that I've actually used is Ghostscript. I remember it having a solid command line interface we used to generate images for some online PDF viewer Many Years Ago.

Source https://stackoverflow.com/questions/61839856

QUESTION

Powershell won't output "£" in email html body

Asked 2020-May-08 at 13:05

I have the following code, which counts the number of PDFs in specific folders, and counts the number of sheets in those specific PDFs, and sends an email with this data.

I've anonymised part of the script.

...

ANSWER

Answered 2020-May-08 at 12:18

It's a HTML encoding issue. I think you need to either use the following code.

Source https://stackoverflow.com/questions/61678329

QUESTION

PDFsharp - overlay page from other PDF

Asked 2020-Mar-17 at 12:32

I'm generating PDF files using PDFsharp, and I need to overlay the PDF I'm generating with a specific page from another PDF.

I've created this method:

...

ANSWER

Answered 2020-Mar-17 at 12:32

You can append the page number to the name of the PDF file, separated with a hash sign ("#").

To get page 7 of "sample.pdf", use the filename "sample.pdf#6" (zero-based page numbers).

Source https://stackoverflow.com/questions/60720344

QUESTION

Unable to import pdftotext after installing with conda and poppler, Windows 10

Asked 2020-Feb-11 at 09:20

I'm trying to use pdftotext, but it won't import.

I'm running Windows 10 (64 bit) on a Lenovo IdeaPad S340, a work laptop.

Following the directions here and here (which were super helpful), I:

Installed Microsoft Visual C++ Build Tools.
Installed Anaconda.
Got the latest version of Anaconda and updated it, using a separate Anaconda3 commands for each of these steps. I don't recall the commands, and haven't found them again.
Updated Microsoft Visual 14.
Used conda to install poppler via Anaconda3 command: conda install -c conda-forge poppler
Used pip to install pdftotext via Anaconda3 command: pip install pdftotext

After that:

This happens in the Python 3.8 (32 bit) command prompt:

...

ANSWER

Answered 2020-Feb-11 at 09:20

Okay, I figured it out! If you install pdftotext using Anaconda and conda, then importing it seems to only work when you run it in the Python interpreter from within the Anaconda3 shell.

So, I had to switch to the Python interpreter mode in the Anaconda3 PowerShell first: python

Then, I could import pdftotext with no error: import pdftotext

It looked like this:

Source https://stackoverflow.com/questions/59959978

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install xpdf

You can download it from GitHub.

Support

If you find a bug in Xpdf, i.e., if it prints an error message, crashes, or incorrectly displays a document, and you don’t see that bug listed here, please send me email, with a pointer (URL, ftp site, etc.) to the PDF file.

Find more information at: