pdfgrep | search text in PDF files | Regex library

by pdfgrep C++ Version: v2.1.2 License: GNU GPLv2

X-Ray Key Features Code Snippets Community Discussions(8)Vulnerabilities Install Support

kandi X-RAY | pdfgrep Summary

pdfgrep is a C++ library typically used in Utilities, Regex applications. pdfgrep has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. You can download it from GitLab.

pdfgrep is a tool to search text in PDF files. It works similarly to grep.

Support

Quality

Security

License

Reuse

Support

pdfgrep has a low active ecosystem.

It has 143 star(s) with 17 fork(s). There are no watchers for this library.

It had no major release in the last 12 months.

There are 20 open issues and 0 have been closed. On average issues are closed in 31 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of pdfgrep is v2.1.2

Quality

pdfgrep has no bugs reported.

Security

pdfgrep has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

pdfgrep is licensed under the GNU GPLv2 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

pdfgrep releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of pdfgrep

Get all kandi verified functions for this library.

pdfgrep Key Features

No Key Features are available at this moment for pdfgrep.

pdfgrep Examples and Code Snippets

No Code Snippets are available at this moment for pdfgrep.

Community Discussions

Trending Discussions on pdfgrep

How can I pdfgrep a pdf so that only bold matches are shown?

How to determine whether tininess is detected before rounding or after rounding or indeterminable?

grep : multiline + positive lookahead

how can I match column fields and group their values together?

pdfgrep pattern to include/exclude linebreak

grep/pdfgrep perl regex to check multiple lines

Shell Script to check content of PDF Files

passing a string with spaces to an -exec sh within a grep bash function

QUESTION

How can I pdfgrep a pdf so that only bold matches are shown?

Asked 2022-Feb-01 at 01:30

I am trying to list all occurrences of a bold string in a pdf with its page number. But I don't want to list those occurrences where it is not bold.

So far I have:

...

ANSWER

Answered 2022-Feb-01 at 01:30

If you are lucky in some rare cases you might get a means to say page x uses Fonts like CID Bold & Normal e.g they could be different fonts or thicknesses, lets take one example, It is contrived so not that uncommon, but illustrates several points.

so there are commandline tools to dig into the fonts and text and provide fine detail

Source https://stackoverflow.com/questions/70932077

QUESTION

How to determine whether tininess is detected before rounding or after rounding or indeterminable?

Asked 2021-Apr-22 at 00:20

IEEE 754-2008:

7.5 Underflow

The underflow exception shall be signaled when a tiny non-zero result is detected. For binary formats, this shall be either:

a) after rounding — when a non-zero result computed as though the exponent range were unbounded would lie strictly between ±b^emin, or

b) before rounding — when a non-zero result computed as though both the exponent range and the precision were unbounded would lie strictly between ±b^emin.

The implementer shall choose how tininess is detected, but shall detect tininess in the same way for all operations in radix two, including conversion operations under a binary rounding attribute.

However, both C11 and C17..C2x (working draft — February 5, 2020, n2479.pdf) say nothing about tininess:

...

ANSWER

Answered 2021-Apr-22 at 00:20

The following program may determine whether tininess is reported before or after rounding.

Source https://stackoverflow.com/questions/67177926

QUESTION

grep : multiline + positive lookahead

Asked 2021-Apr-08 at 08:56

I have following lines :

...

ANSWER

Answered 2021-Apr-08 at 08:56

You may use this gnu grep regex with (?s) or single-line mode:

Source https://stackoverflow.com/questions/66998466

QUESTION

how can I match column fields and group their values together?

Asked 2021-Mar-28 at 14:35

I am sorting some files that I've created using pdfgrep, to list page numbers of certain PDFs that I have. it produced the following output:

...

ANSWER

Answered 2021-Mar-28 at 14:35

With your shown samples, please try following. Written and tested in GNU awk.

Source https://stackoverflow.com/questions/66841703

QUESTION

pdfgrep pattern to include/exclude linebreak

Asked 2020-Jul-08 at 22:50

pdfgrep works like grep except that it acts on pages instead of lines. How can I craft a regular expression with a newline character?

I want to look for a, followed by any number of characters except linebreaks, followed by b, but pdfgrep 'a[^\n]*b' doesn't work, whereas pdfgrep 'a.*b' returns results that span multiple lines. (I've examined the output with xxd to confirm that these newlines are indeed \x0A.)

...

ANSWER

Answered 2020-Jul-08 at 22:49

By default, pdfgrep uses a POSIX compliant regex flavor where . matches any char including line break chars.

Fortunately, pdfgrep also supports PCRE regex flavor with the help of -P flag. In a PCRE regex flavor, . matches any char but line break chars.

Thus, you can use

Source https://stackoverflow.com/questions/62804727

QUESTION

grep/pdfgrep perl regex to check multiple lines

Asked 2020-Apr-01 at 10:22

I want to check if in my text exists different words. This words are in the complete text. But I don't find a regex for grep/pdfgrep with perl regex.

...

ANSWER

Answered 2020-Apr-01 at 10:10

If you have pdftotext installed, you can use other methods than grep to get a regular expression acting across multiple lines. Try:

Source https://stackoverflow.com/questions/60967464

QUESTION

Shell Script to check content of PDF Files

Asked 2020-Mar-28 at 18:54

Is there a way to check the content of PDF Files and output a specified string?

With this shell script i get all files in a loop

...

ANSWER

Answered 2020-Mar-28 at 18:32

Updated Answer

Ok, I think you are looking for "My specified string NNN" in any PDF, so you need a Perl PCRE with pdfgrep -Po like this:

Source https://stackoverflow.com/questions/60904251

QUESTION

passing a string with spaces to an -exec sh within a grep bash function

Asked 2020-Feb-10 at 04:36

I'm wanting to recursively search for strings in pdf files using pdftotext (not pdfgrep) using a bash function and passing my string of choice to it. The string must be able to handle special characters, as a minimum, spaces. As a bare command line, this works perfectly in a bash shell and demonstrates what I want to do.

...

ANSWER

Answered 2020-Feb-10 at 04:36

The '$1' part should be changed to "'"$1"'" (", ', "$1", ', "), if your search string is double quotes friendly.

See the following simplified example:

Source https://stackoverflow.com/questions/60142727

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install pdfgrep

Tarballs for releases are available at https://pdfgrep.org/download.html. The development version is available as a git repository at https://gitlab.com/pdfgrep/pdfgrep.

Support

General questions, suggestions, bug reports, patches or anything else can be sent to the mailinglist. You can also use the issue tracker for bug reports or create a merge request on GitLab, if you prefer that over mailinglists.

Find more information at: