grammar | Implementation of generative semantic grammar | Code Quality library

by asaparov C++ Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | grammar Summary

grammar is a C++ library typically used in Code Quality applications. grammar has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

TODO: Document this code. The repository is located at This code implements the generative model of grammar as described in this paper. Given a logical form, the grammar generates a derivation tree top-down, by selecting production rules probabilistically conditioned on the logical form. The leaves of the derivation tree form the tokens of the output utterance.

Support

Quality

Security

License

Reuse

Support

grammar has a low active ecosystem.

It has 10 star(s) with 2 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 1 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of grammar is current.

Quality

grammar has no bugs reported.

Security

grammar has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

grammar is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

grammar releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of grammar

Get all kandi verified functions for this library.

grammar Key Features

No Key Features are available at this moment for grammar.

grammar Examples and Code Snippets

No Code Snippets are available at this moment for grammar.

Community Discussions

Trending Discussions on grammar

General approach to parsing text with special characters from PDF using Tesseract?

Why is my reddit comment bot commenting in all caps?

ANTLR parser to throw exception for "true and or false" statement

After Upgrading spring-data-jdbbc from 1.1.12.RELEASE to 2.0.6.RELEASE LocalDateTime parameters in Repository methods fail

Is it possible to get an XML copy of the default search grammar?

How to parse rtsp url with boost qi?

How do I allow my code that evaluates string equations to work with PI and E

how can I make an href to an id appear like a whole different page (when clicked)?

Are scannerless parser grammars still supported in ANTLR4?

All viable prefixes of a Context Free Grammer

QUESTION

General approach to parsing text with special characters from PDF using Tesseract?

Asked 2021-Jun-15 at 20:17

I would like to extract the definitions from the book The Navajo Language: A Grammar and Colloquial Dictionary by Young and Morgan. They look like this (very blurry):

I tried running it through the Google Cloud Vision API, and got decent results, but it doesn't know what to do with these "special" letters with accent marks on them, or the curls and lines on/through them. And because of the blurryness (there are no alternative sources of the PDF), it gets a lot of them wrong. So I'm thinking of doing it from scratch in Tesseract. Note the term is bold and the definition is not bold.

How can I use Node.js and Tesseract to get basically an array of JSON objects sort of like this:

...

ANSWER

Answered 2021-Jun-15 at 20:17

Tesseract takes a lang variable that you can expand to include different languages if they're installed. I've used the UB Mannheim (https://github.com/UB-Mannheim/tesseract/wiki) installation which includes a ton of languages supported.

To get better and more accurate results, the best thing to do is to process the image before handing it to Tesseract. Set a white/black threshold so that you have black text on white background with no shading. I'm not sure how to do this in Node, but I've done it with Python's OpenCV library.

If that font doesn't get you decent results with the out of the box, then you'll want to train your own, yes. This blog post walks through the process in great detail: https://towardsdatascience.com/simple-ocr-with-tesseract-a4341e4564b6. It revolves around using the jTessBoxEditor to hand-label the objects detected in the images you're using.

Edit: In brief, the process to train your own:

Install jTessBoxEditor (https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/). Requires Java Runtime installed as well.
Collect your training images. They want to be .tiffs. I found I got fairly accurate results with not a whole lot of images that had a good sample of all the characters I wanted to detect. Maybe 30/40 images. It's tedious, so you don't want to do TOO many, but need enough in order to get a good sampling.
Use jTessBoxEditor to merge all the images into a single .tiff
Create a training label file (.box)j. This is done with Tesseract itself. tesseract your_language.font.exp0.tif your_language.font.exp0 makebox
Now you can open the box file in jTessBoxEditor and you'll see how/where it detected the characters. Bounding boxes and what character it saw. The tedious part: Hand fix all the bounding boxes and characters to accurately represent what is in the images. Not joking, it's tedious. Slap some tv episodes up and just churn through it.
Train the tesseract model itself

save a file: font_properties who's content is font 0 0 0 0 0
run the following commands:

tesseract num.font.exp0.tif font_name.font.exp0 nobatch box.train

unicharset_extractor font_name.font.exp0.box

shapeclustering -F font_properties -U unicharset -O font_name.unicharset font_name.font.exp0.tr

mftraining -F font_properties -U unicharset -O font_name.unicharset font_name.font.exp0.tr

cntraining font_name.font.exp0.tr

You should, in there close to the end see some output that looks like this:

Master shape_table:Number of shapes = 10 max unichars = 1 number with multiple unichars = 0

That number of shapes should roughly be the number of characters present in all the image files you've provided.

If it went well, you should have 4 files created: inttemp normproto pffmtable shapetable. Rename them all with the prefix of your_language from before. So e.g. your_language.inttemp etc.

Then run:

combine_tessdata your_language

The file: your_language.traineddata is the model. Copy that into your Tesseract's data folder. On Windows, it'll be like: C:\Program Files x86\tesseract\4.0\tessdata and on Linux it's probably something like /usr/shared/tesseract/4.0/tessdata.

Then when you run Tesseract, you'll pass the lang=your_language. I found best results when I still passed an existing language as well, so like for my stuff it was still English I was grabbing, just funny fonts. So I still wanted the English as well, so I'd pass: lang=your_language+eng.

Source https://stackoverflow.com/questions/67991718

QUESTION

Why is my reddit comment bot commenting in all caps?

Asked 2021-Jun-11 at 23:38

This bot reads a text file to post a random response when the keyboard is entered. However, It's sending in all caps when the txt file is written in proper grammar.

Sorry if I'm completely ignorant. I'm in early stages of learning and this is kind of my building block. This code isnt mine but I'm modifying it for my use.

...

ANSWER

Answered 2021-Jun-11 at 23:36

self.quotes = [q.upper() for q in f.read().split('\n') if q]

As you can see here, q, which is a line from your text file, gets converted to uppercase inside the list comprehension. upper is a method of string that converts all lowercase characters in a string to uppercase.

Remove the call to upper and you should be fine.

Source https://stackoverflow.com/questions/67944413

QUESTION

ANTLR parser to throw exception for "true and or false" statement

Asked 2021-Jun-11 at 14:58

I'm using ANTLR 4 and have a fairly complex grammar. I'm trying to simplify here...

Given an expression like: true and or false I want a parsing error since the operands defined expect expressions on either side and this has an expr operand operand expr

My reduced grammar is:

...

ANSWER

Answered 2021-Jun-10 at 20:13

You should get a parsing error if you force the parser to consume all tokens by "anchoring" a rule with the built-in EOF

Source https://stackoverflow.com/questions/67908640

QUESTION

After Upgrading spring-data-jdbbc from 1.1.12.RELEASE to 2.0.6.RELEASE LocalDateTime parameters in Repository methods fail

Asked 2021-Jun-10 at 20:29

I am trying to upgrade from Spring Boot 2.2.x to 2.3 I have encountered an issue with the upgrade of spring-data-jdbc. In 1.1.x one could write the following query and it would work as expected

...

ANSWER

Answered 2021-Jun-10 at 20:29

It will be fixed with the upcoming Spring-data-jdbc 2.3.x. Relevant issue 974 has been closed.

Source https://stackoverflow.com/questions/67503013

QUESTION

Is it possible to get an XML copy of the default search grammar?

Asked 2021-Jun-10 at 18:04

I'm doing a translated version of a MarkLogic search application and I want to translate the search grammar (AND, OR, etc.). I'm currently just using the default operators and I realize they are documented, but is there any way to get this in XML or JSON format?

The endpoint that retrieves the default search options only returns a very few elements, of which is not one:

...

ANSWER

Answered 2021-Jun-10 at 18:04

Looking in the MarkLogic installed files, in MarkLogic/Modules/MarkLogic/appservices/search/search-impl.xqy

you will find the default grammar in the $default-options:

Source https://stackoverflow.com/questions/67924960

QUESTION

How to parse rtsp url with boost qi?

Asked 2021-Jun-08 at 06:04

I'm trying to parse RTSP-url like this: ...

ANSWER

Answered 2021-Jun-07 at 22:01

The relatively obvious workaround would be to URL-escape the @:

Live On Coliru

Source https://stackoverflow.com/questions/67873608

QUESTION

How do I allow my code that evaluates string equations to work with PI and E

Asked 2021-Jun-08 at 03:32

I took this code from a previously answered question and I am trying to expand it so that it works with E and PI when the String contains "E" and "PI". I don't really understand how the code works because I am pretty new to Java and the explanation on the original comment was not great (I have since lost the link to the comment unfortunately).

...

ANSWER

Answered 2021-Jun-08 at 03:32

The key to the solution is to modify the grammar to support your supported named constants. It is therefore a requirement (as your examples suggest) that named constants be in capital letters: A - Z (to distinguish between functions).

(The specified grammar is incomplete in that it does not specify the syntax of functions but the code suggests it is lower-case characters in a subset of trig and log functions.)

So, grammar is updated as:

Source https://stackoverflow.com/questions/67880217

QUESTION

how can I make an href to an id appear like a whole different page (when clicked)?

Asked 2021-Jun-07 at 12:45

I know that one type of link you can have is to a place within the page, using .

I recently came across a carrd website, where clicking any of the links appears to take you to a new page, but I believe it's actually linking to an id (after clicking a link, you can see it's /#id rather than /someaddress). But it appears like a separate page, because you can't just scroll back up like usual when you use a link to an id.

I'm wondering how this works; I haven't been able to find anything on Google.

(edit: grammar)

...

ANSWER

Answered 2021-Jun-07 at 02:24

They're listening for the navigation to the bookmark in JS. From there, they dynamically set the display value on the other section tags on the page. In other words, this is done with JS and isn't a normal function of the bookmarks feature. If you want to do this, you'd need to make a JS script that hides the rest of the page when you jump to a bookmark.

Source https://stackoverflow.com/questions/67865262

QUESTION

Are scannerless parser grammars still supported in ANTLR4?

Asked 2021-Jun-07 at 00:17

I have a scannerless parser grammar utilizing the CharsAsTokens faux lexer which generates a usable Java Parser class for ANTLR4 versions through 4.6. But when updating to ANTLR 4.7.2 through 4.9.3-SNAPSHOT, the tool generates code producing dozens of compilation errors from the same grammar file, as detailed below.

My question here is simply: Are scannerless parser grammars no longer supported, or must their character-based terminals be specified differently in 4.7 and beyond?

Update:

Unfortunately, I cannot post my complete grammar here as it is derived from FOUO security marking guidance, access to which is retricted by the U.S. government (I am a DoD/IC contractor).

The incompatible upgrade issue however is entirely reproducible with the CSQL.g4 scannerless parser grammar example referred to by Ter in Section 5.6 of The Definitive ANTLR 4 Reference.

As does my grammar, the CSQL example uses CharsAsTokens.java for its tokenizer, and CharVocab.tokens as its token vocabulary.

Note that every token name is specified by its ASCII character-literal equivalent, as in:

...

ANSWER

Answered 2021-Jun-07 at 00:17

Try defining a GrammarLexer.g4 file instead of the GrammarLexer.tokens file. (You'd still using the options: { tokenVocab = GrammarLexer; } like you do if you create the GrammarLexer.tokens file} It could be as simple as:

Source https://stackoverflow.com/questions/67830364

QUESTION

All viable prefixes of a Context Free Grammer

Asked 2021-Jun-06 at 22:14

I am stuck to a problem from the famous dragon Book of Compiler Design.How to find all the viable prefixes of the following grammar:

...

ANSWER

Answered 2021-Jun-06 at 22:14

0ⁿ1ⁿ is not a regular language; regexen don't have variables like n and they cannot enforce an equal number of repetitions of two distinct subsequences. Nonetheless, for any context-free grammar, the set of viable prefixes is a regular language. (A proof of this fact, in some form, appears at the beginning of Part II of Donald Knuth's seminal 1965 paper, On the Translation of Languages from Left to Right, which demonstrated both a test for the LR(k) property and an algorithm for parsing LR(k) grammars in linear time.)

OK, to the actual question. A viable prefix for a grammar is (by definition) the prefix of a sentential form which can appear on the stack during a parse using that grammar. It's called "viable" (which means "still alive" or "could continue growing") precisely because it must be the prefix of some right sentential form whose suffix contains no non-terminal symbol. In other words, there exists a sequence of terminals which can be appended to the viable prefix in order to produce a right-sentential form; the viable prefix can grow.

Knuth shows how to create a DFA which produces all viable prefixes, but it's easier to see this DFA if we already have the LR(k) parser produced by an LR(k) algorithm. That parser is a finite-state machine whose alphabet is the set of terminal and non-terminal symbols of a grammar. To get the viable-prefix grammar, we use exactly the same state machine, but we remove the stack (so that it becomes just a state machine) and the reduce actions, leaving only the shift and goto actions as transitions. All states in the viable-prefix machine are accepting states, since any prefix of a viable prefix is itself a viable prefix.

A key feature of this new automaton is that it cannot extend a prefix with a reduce action (since we removed all the reduce actions). A prefix with a reduce action is a prefix which ends in a handle -- recall that a handle is the right-hand side of some production -- so another definition of a viable prefix is that it is a right-sentential form (that is, a possible step in a derivation) which does not extend beyond the right-most handle.

The grammar you are working with has only two productions, so there are only two handles, 01 and 0S1. Note that 10 and 1S cannot be subsequences of any right-sentential form, nor can a right-sentential form contain more than one S. Any right-sentential form must either be a sentence 0ⁿ1ⁿ or a sentential form 0ⁿS1ⁿ where n>0. But every handle ends at the first 1 of a sentential form, and so a viable prefix must end at or before the first 1. This produces precisely the four possibilities you list, which we can condense to the regular expression 0*0(S1?)?.

Chopping off the suffix removed the second n from the formula, so there is no longer a requirement of concordance and the language is regular.

Note:

Questions like this and their answers are begging to be rendered using MathJax. StackOverflow, unfortunately, does not provide this extension, which is apparently considered unnecessary for programming. However, there is a site in the StackExchange constellation dedicated to computing science questions, http://cs.stackexchange.com, and another one dedicated to mathematical questions, http://math.stackexchange.com. Formal language theory is part of both computing science and mathematics. Both of those sites permit MathJax, and questions on those sites will not be closed because they are not programming questions. I suggest you take this information into account for questions like this one.

Source https://stackoverflow.com/questions/67862072

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install grammar

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: