pdfrw | pdfrw is a pure Python library that reads and writes PDFs | Document Editor library

by pmaupin Python Version: 0.4 License: Non-SPDX

X-Ray Key Features Code Snippets(10)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | pdfrw Summary

pdfrw is a Python library typically used in Editor, Document Editor applications. pdfrw has no bugs, it has no vulnerabilities, it has build file available and it has high support. However pdfrw has a Non-SPDX License. You can install using 'pip install pdfrw' or download it from GitHub, PyPI.

pdfrw is a pure Python library that reads and writes PDFs

Support

Quality

Security

License

Reuse

Support

pdfrw has a highly active ecosystem.

It has 1710 star(s) with 270 fork(s). There are 64 watchers for this library.

It had no major release in the last 12 months.

There are 100 open issues and 63 have been closed. On average issues are closed in 165 days. There are 19 open pull requests and 0 closed requests.

It has a positive sentiment in the developer community.

The latest version of pdfrw is 0.4

Quality

pdfrw has 0 bugs and 0 code smells.

Security

pdfrw has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

pdfrw code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

pdfrw has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

pdfrw releases are available to install and integrate.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

Top functions reviewed by kandi - BETA

kandi has reviewed pdfrw and discovered the below as its top functions. This is intended to give you an instant insight into pdfrw implemented functionality, and help decide if they suit your requirements.

Format a file .
Performs a flate filter .
Gets the tokk tokens from the specified input stream .
Loads an indirect object .
Render the xobox .
Builds a copy of the contents from the contents .
Decodes a string using UTF8 encoding .
Iterates over all objects in the given source .
Identifies the PDF document encoding .
Parses the given PDF .

Get all kandi verified functions for this library.

pdfrw Key Features

No Key Features are available at this moment for pdfrw.

pdfrw Examples and Code Snippets

Amir Accounting Software,Requirements

Python

Lines of Code : 5

License : Strong Copyleft (GPL-3.0)

Copy

git clone https://github.com/Jooyeshgar/amir.git
cd amir
sudo apt install python3-pip
pip3 install -r requirements.txt
sudo apt install python3-setuptools python3-gi gettext python3-passlib python3-cairocffi python3-cairosvg python3-pdfrw

default

Python

Lines of Code : 3

License : No License

Copy

python2 tpdfrw.py xxx.pdf

qpdf --qdf --object-streams=disable xxx.pdf xxx-decoded.pdf

python2 tpdfrw.py xxx-decoded.pdf

Other (or if you prefer setting up python yourself)

Python

Lines of Code : 3

License : Strong Copyleft (GPL-3.0)

Copy

virtualenv venv -p python3.6
. ./venv/bin/activate
pip install -r requirements.txt

Writing a Python pdfrw PdfReader object to an array of bytes / filestream

Python

Lines of Code : 10

License : Strong Copyleft (CC BY-SA 4.0)

Copy

from io import BytesIO

new_bytes_object = BytesIO()

pdfrw.PdfWriter.write(new_bytes_object, filled_pdf)
# I'm not sure about the syntax, I haven't used this lib before

with open("output.tx

How to find the Font Size of every paragraph of PDF file using python code?

Python

Lines of Code : 16

License : Strong Copyleft (CC BY-SA 4.0)

Copy

from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextContainer, LTChar,LTLine,LAParams
import os
path=r'/path/to/pdf'

Extract_Data=[]

for page_layout in extract_pages(path):
    for element in page_layout:

How to handle importing a package import if it doesn't exist

Python

Lines of Code : 10

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import subprocess

def install(package):
    subprocess.call(['pip', 'install', package])

try:
    import fitz  # requires fitz, PyMuPDF
except:
    install('fitz')

How can I edit text in pdf that is encoded in hexadecimal format?

Python

Lines of Code : 60

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import sys
for i, n in enumerate(range(32, 128)):
    sys.stdout.write(f"{hex(n - ord('R') + 0x34).ljust(4)}: '{chr(n)}' ")
    if (i + 1) % 8 == 0:
        sys.stdout.write('\n')

0x2 : ' ' 0x3 : '!' 0x4 : '"' 0x5

How can I edit text in pdf that is encoded in hexadecimal format?

Python

Lines of Code : 6

License : Strong Copyleft (CC BY-SA 4.0)

Copy

target.pages[0].Resources.Font=font_pdf.pages[0].Resources.Font
target.pages[0].Contents.stream.replace(
    "BT\n/F8 40 Tf\n1 0 0 -1 569 376 Tm\n<0034> Tj\n26 0 Td <0028> Tj\nET", 
    f"BT\n/F0 11 Tf\n1 0 0 -1 500 500 Tm\n(\x

Is there a faster way to merge two files rather than page by page?

Python

Lines of Code : 12

License : Strong Copyleft (CC BY-SA 4.0)

Copy

from pdfrw import PdfReader, PdfWriter, PageMerge

p1 = pdfrw.PdfReader("file1")
p2 = pdfrw.PdfReader("file2")

for page in range(len(p1.pages)):
    merger = PageMerge(p1.pages[page])
    merger.add(p2.pages[page]).render()

writer = PdfW

Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks

Python

Lines of Code : 94

License : Strong Copyleft (CC BY-SA 4.0)

Copy

from PyPDF4 import PdfFileWriter, PdfFileMerger, PdfFileReader 

# To manipulate the PDF dictionary
import PyPDF4.pdf as PDF

import logging

def add_nums(num_entry, page_offset, nums_array):
    for num in num_entry['/Nums']:
        if i

Community Discussions

Trending Discussions on pdfrw

How to convert PDF web links to file open actions with python pdfrw library

Writing a Python pdfrw PdfReader object to an array of bytes / filestream

How to find the Font Size of every paragraph of PDF file using python code?

Weaseyprint, Cairo, Dajngo on Pythonanywhere 25MAY21 can not pass a warning

How to handle importing a package import if it doesn't exist

How can I edit text in pdf that is encoded in hexadecimal format?

Python library "PDFrw" writes to annotations that remains invisible until clicking the field

Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks

Why is PDF form information stored on both 'Root.AcroForm.Fields' & 'Root.Pages.Kids[0].Annots'

PyQt5 - Passing user input from QLineEdit to update a dictionary in another file

QUESTION

How to convert PDF web links to file open actions with python pdfrw library

Asked 2022-Jan-11 at 20:24

I'm using pdfkit to convert html to pdf which works great, but the external links in the pdf are web links.

The pdf viewer that we are using does not recognize the pdf web links, but file open actions do work.

I've been trying to change the pdf link annotation from a web link to a file open action with the pdfrw library.

I tried to edit the pdf annotation with the following code, but it's not working.

...

ANSWER

Answered 2022-Jan-11 at 20:24

So after a similar battle today...

you can't define S='/Launch' as string like that. You have to use:

Source https://stackoverflow.com/questions/70569025

QUESTION

Writing a Python pdfrw PdfReader object to an array of bytes / filestream

Asked 2021-Nov-22 at 14:07

I'm currently working on a simple proof of concept for a pdf-editor application. The example is supposed to be a simplified python script showcasing how we could use the pdfrw library to edit PDF files with forms in them.

So, here's the issue. I'm not interested in writing the edited PDF to a file. The idea is that file opening and closing is going to most likely be handled by external code and so I want all the edits in my files to be done in memory. I don't want to write the edited filestream to a local file.

Let me specify what I mean by this. I currently have a piece of code like this:

...

ANSWER

Answered 2021-Nov-22 at 14:07

To save your altered PDF to memory in an object that can be passed around (instead of writing to a file), simply create an empty instance of io.BytesIO:

Source https://stackoverflow.com/questions/68985391

QUESTION

How to find the Font Size of every paragraph of PDF file using python code?

Asked 2021-Jun-24 at 06:43

Right now i am Working on a project in which i have to find the font size of every paragraph in that PDF file. i have tried various python libraries like fitz, PyPDF2, pdfrw, pdfminer, pdfreader. all the libraries fetch the text data but i don't know how to fetch the font size of the paragraphs. thanks in advance..your help is appreciated.

i have tried this but failed to get font size.

...

ANSWER

Answered 2021-Jun-24 at 06:43

I got the solution from pdfminer. The python code for the same is given below.

Source https://stackoverflow.com/questions/68097779

QUESTION

Weaseyprint, Cairo, Dajngo on Pythonanywhere 25MAY21 can not pass a warning

Asked 2021-Jun-01 at 22:02

Sorry I know there seems to be a lot about this topic. But I do not see a real resolution?

I am trying to place a Django ecommerce pizza shop for learning Django on the website. Locally this works great no issues. I matched my environment locally to that on the ENV for the server. I got this issue resolved locally when I updated Cairo on my computer. So the emulated server works great.

Python 3.8.0 Server Pythonanywhere

Here is the error and follow on info.

Error from error log on ther server. 2021-05-28 16:13:41,156: /home/williamc1jones/.virtualenvs/myvirtualenv/lib/python3.8/site-packages/weasyprint/document.py:35: UserWarning: There are known rendering problems and missing features with cairo < 1.15.4. WeasyPrint may work with older versions, but please read the note about the needed cairo version on the "Install" page of the documentation before reporting bugs. http://weasyprint.readthedocs.io/en/latest/install.html

views.py file in order app

...

ANSWER

Answered 2021-Jun-01 at 22:01

Yes I wanted to thank everyone for their help. While I have a time lime for my project I will dit the post to see my work around as well. Thanks.

Source https://stackoverflow.com/questions/67744167

QUESTION

How to handle importing a package import if it doesn't exist

Asked 2021-Apr-30 at 14:50

import os
import re

import fitz  # requires fitz, PyMuPDF
import pdfrw
import subprocess
import os.path
import sys
from PIL import Image

...

ANSWER

Answered 2021-Apr-30 at 14:50

Using try-catch to handle missing package

Ex:

Source https://stackoverflow.com/questions/67335355

QUESTION

How can I edit text in pdf that is encoded in hexadecimal format?

Asked 2020-Sep-26 at 13:06

I'm trying to find and replace certain text with specific value in PDF. I am using python library pdfrw, since my preferred environment is python. Following is example content in first page of the document.

...

ANSWER

Answered 2020-Sep-26 at 10:52

In general I think pdf text can be compressed/encoded by different algorithms hence pdfrw doesn't decode text by itself. So you can't know what is the correct way in general, 'cause it is different for each case. I've tried simple pdf from here and it contains just plain text inside.

Probably you didn't figure out what is the correct correspondence between characters and hex codes is due to the fact that it may be a compressed stream - it means each code depends on the position of character in whole stream plus on the value of all previous characters. E.g. text may be zlib compressed.

Also pdf text is a sequence of commands for positioning/formatting/outputing text, so in general you have to be able to decode/encode all these commands to be able to process really any text. Your format may contain symbol table where all used symbols are mapped to hex value. To figure out correct mapping all symbols should be present in example text.

For your case you might probably use next table, for conversion, I use the fact that letter R has hex value 0x34:

Try it online!

Source https://stackoverflow.com/questions/64075158

QUESTION

Python library "PDFrw" writes to annotations that remains invisible until clicking the field

Asked 2020-May-28 at 07:58

I am following the instructions in this article for writing information to annotations in a PDF document.

The script in the aforementioned article does work. However, after the script is executed and the output file is opened, the fields remain invisible. When clicking on a annotation, the text added from the script appears. But subsequently when clicking elsewhere in the document, the text from the script disappears.

Is there some sort of flag that needs to be triggered, to inform the PDF reader that the fields have been filled?

EDIT:

The script given in the article is probably not really correct.

When reading the first annotation of the unedited PDF, I get the following:

...

ANSWER

Answered 2020-Feb-08 at 04:20

You need to set the /NeedAppearances tag to True.

Check this out- https://github.com/pmaupin/pdfrw/issues/84#issuecomment-463493521

Source https://stackoverflow.com/questions/59730098

QUESTION

Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks

Asked 2020-May-22 at 19:08

I'm trying to automate merging several PDF files and have two requirements: a) existing bookmarks AND b) pagelabels (custom page numbering) need to be retained.

Retaining bookmarks when merging happens by default with PyPDF2 and pdftk, but not with pdfrw. Pagelabels are consistently not retained in PyPDF2, pdftk or pdfrw.

I am guessing, after having searched a lot, that there is no straightforward approach to doing what I want. If I'm wrong then I hope someone can point to this easy solution. But, if there is no easy solution, any tips on how to get this going in python will be much appreciated!

Some example code:

1) With PyPDF2

...

ANSWER

Answered 2020-May-22 at 11:45

You need to iterate through the existing PageLabels and add them to the merged output, taking care to add an offset to the page index entry, based on the number of pages already added.

This solution also requires PyPDF4, since PyPDF2 produces a weird error (see bottom).

Source https://stackoverflow.com/questions/61740267

QUESTION

Why is PDF form information stored on both 'Root.AcroForm.Fields' & 'Root.Pages.Kids[0].Annots'

Asked 2020-May-16 at 09:18

If I update the value of a form in either of these locations, both are affected. Why are they stored twice?

When updating these forms, is one preferred to be used over the other one? (I'm using Python library pdfrw)

...

ANSWER

Answered 2020-May-16 at 09:18

The AcroForm dictionary references all abstract form fields (directly or indirectly) to allow immediate access to all fields of a document.

Each abstract form field may have any number of widget annotations (except signature fields with at most one annotation).

Widget annotations are for displaying the form field contents. Thus, they must be attached to the page they respectively are displayed upon. So they are referenced from the Annots of the respective page.

If a form field has no widget annotation, you cannot find it from any page.

If a form field has exactly one widget annotation, you can usually find it from exactly one page, the page that annotation is on. In this case the form field object and the widget annotation object may be merged into a single object.

If a form field has more widget annotations, you can usually find it on one or more pages, depending on whether all those annotations are on the same or one different pages.

Thus,

Why are they stored twice?

They are not stored twice, each form field is stored only once, in one PDF object. But that form field object can usually be reached from multiple locations in the object model, from the global AcroForm object and from the Annots of each page that form field has a widget on.

Source https://stackoverflow.com/questions/61832674

QUESTION

PyQt5 - Passing user input from QLineEdit to update a dictionary in another file

Asked 2020-Mar-26 at 14:10

In form.py I have a Form class, which allows the user input various data. The idea is to use this data to update the values in a dictionary in document.py, which is then used to populate a pdf file. The "write custom pdf" method which creates said pdf is invoked via a button in my main logic file. However, the method get_input_1 below (imported from form to doc) is not able to update the dictionary at all. I have tried various options, including the solution described here, but none of them seem to work. Any help would be highly appreciated!

gui.py

...

ANSWER

Answered 2020-Mar-26 at 13:53

It doesn't work because the dictionary is only updated in the __init__.
The get_input_1() function is not some sort of "dynamic" thing, it just immediately returns the current value of the text field, which is empty when the form is created.

To update the dictionary, call the function after the dialog's exec(), then process the pdf.

Source https://stackoverflow.com/questions/60868520

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install pdfrw

You can install using 'pip install pdfrw' or download it from GitHub, PyPI.
You can use pdfrw like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: