grammar | Implementation of generative semantic grammar | Code Quality library
kandi X-RAY | grammar Summary
kandi X-RAY | grammar Summary
TODO: Document this code. The repository is located at This code implements the generative model of grammar as described in this paper. Given a logical form, the grammar generates a derivation tree top-down, by selecting production rules probabilistically conditioned on the logical form. The leaves of the derivation tree form the tokens of the output utterance.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of grammar
grammar Key Features
grammar Examples and Code Snippets
Community Discussions
Trending Discussions on grammar
QUESTION
I would like to extract the definitions from the book The Navajo Language: A Grammar and Colloquial Dictionary by Young and Morgan. They look like this (very blurry):
I tried running it through the Google Cloud Vision API, and got decent results, but it doesn't know what to do with these "special" letters with accent marks on them, or the curls and lines on/through them. And because of the blurryness (there are no alternative sources of the PDF), it gets a lot of them wrong. So I'm thinking of doing it from scratch in Tesseract. Note the term is bold and the definition is not bold.
How can I use Node.js and Tesseract to get basically an array of JSON objects sort of like this:
...ANSWER
Answered 2021-Jun-15 at 20:17Tesseract takes a lang
variable that you can expand to include different languages if they're installed. I've used the UB Mannheim (https://github.com/UB-Mannheim/tesseract/wiki) installation which includes a ton of languages supported.
To get better and more accurate results, the best thing to do is to process the image before handing it to Tesseract. Set a white/black threshold so that you have black text on white background with no shading. I'm not sure how to do this in Node, but I've done it with Python's OpenCV library.
If that font doesn't get you decent results with the out of the box, then you'll want to train your own, yes. This blog post walks through the process in great detail: https://towardsdatascience.com/simple-ocr-with-tesseract-a4341e4564b6. It revolves around using the jTessBoxEditor to hand-label the objects detected in the images you're using.
Edit: In brief, the process to train your own:
- Install jTessBoxEditor (https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/). Requires Java Runtime installed as well.
- Collect your training images. They want to be .tiffs. I found I got fairly accurate results with not a whole lot of images that had a good sample of all the characters I wanted to detect. Maybe 30/40 images. It's tedious, so you don't want to do TOO many, but need enough in order to get a good sampling.
- Use jTessBoxEditor to merge all the images into a single .tiff
- Create a training label file (.box)j. This is done with Tesseract itself.
tesseract your_language.font.exp0.tif your_language.font.exp0 makebox
- Now you can open the box file in jTessBoxEditor and you'll see how/where it detected the characters. Bounding boxes and what character it saw. The tedious part: Hand fix all the bounding boxes and characters to accurately represent what is in the images. Not joking, it's tedious. Slap some tv episodes up and just churn through it.
- Train the tesseract model itself
- save a file:
font_properties
who's content isfont 0 0 0 0 0
- run the following commands:
tesseract num.font.exp0.tif font_name.font.exp0 nobatch box.train
unicharset_extractor font_name.font.exp0.box
shapeclustering -F font_properties -U unicharset -O font_name.unicharset font_name.font.exp0.tr
mftraining -F font_properties -U unicharset -O font_name.unicharset font_name.font.exp0.tr
cntraining font_name.font.exp0.tr
You should, in there close to the end see some output that looks like this:
Master shape_table:Number of shapes = 10 max unichars = 1 number with multiple unichars = 0
That number of shapes should roughly be the number of characters present in all the image files you've provided.
If it went well, you should have 4 files created: inttemp
normproto
pffmtable
shapetable
. Rename them all with the prefix of your_language
from before. So e.g. your_language.inttemp
etc.
Then run:
combine_tessdata your_language
The file: your_language.traineddata
is the model. Copy that into your Tesseract's data folder. On Windows, it'll be like: C:\Program Files x86\tesseract\4.0\tessdata
and on Linux it's probably something like /usr/shared/tesseract/4.0/tessdata
.
Then when you run Tesseract, you'll pass the lang=your_language
. I found best results when I still passed an existing language as well, so like for my stuff it was still English I was grabbing, just funny fonts. So I still wanted the English as well, so I'd pass: lang=your_language+eng
.
QUESTION
This bot reads a text file to post a random response when the keyboard is entered. However, It's sending in all caps when the txt file is written in proper grammar.
Sorry if I'm completely ignorant. I'm in early stages of learning and this is kind of my building block. This code isnt mine but I'm modifying it for my use.
...ANSWER
Answered 2021-Jun-11 at 23:36self.quotes = [q.upper() for q in f.read().split('\n') if q]
As you can see here, q
, which is a line from your text file, gets converted to uppercase inside the list comprehension. upper
is a method of string that converts all lowercase characters in a string to uppercase.
Remove the call to upper and you should be fine.
QUESTION
I'm using ANTLR 4 and have a fairly complex grammar. I'm trying to simplify here...
Given an expression like: true and or false
I want a parsing error since the operands defined expect expressions on either side and this has an expr operand operand expr
My reduced grammar is:
...ANSWER
Answered 2021-Jun-10 at 20:13You should get a parsing error if you force the parser to consume all tokens by "anchoring" a rule with the built-in EOF
QUESTION
I am trying to upgrade from Spring Boot 2.2.x to 2.3 I have encountered an issue with the upgrade of spring-data-jdbc. In 1.1.x one could write the following query and it would work as expected
...ANSWER
Answered 2021-Jun-10 at 20:29It will be fixed with the upcoming Spring-data-jdbc 2.3.x. Relevant issue 974 has been closed.
QUESTION
I'm doing a translated version of a MarkLogic search application and I want to translate the search grammar (AND, OR, etc.). I'm currently just using the default operators and I realize they are documented, but is there any way to get this in XML or JSON format?
The endpoint that retrieves the default search options only returns a very few elements, of which is not one:
ANSWER
Answered 2021-Jun-10 at 18:04Looking in the MarkLogic installed files, in MarkLogic/Modules/MarkLogic/appservices/search/search-impl.xqy
you will find the default grammar in the $default-options
:
QUESTION
I'm trying to parse RTSP-url like this: ...
ANSWER
Answered 2021-Jun-07 at 22:01The relatively obvious workaround would be to URL-escape the @
:
QUESTION
I took this code from a previously answered question and I am trying to expand it so that it works with E and PI when the String contains "E" and "PI". I don't really understand how the code works because I am pretty new to Java and the explanation on the original comment was not great (I have since lost the link to the comment unfortunately).
...ANSWER
Answered 2021-Jun-08 at 03:32The key to the solution is to modify the grammar to support your supported named constants. It is therefore a requirement (as your examples suggest) that named constants be in capital letters: A - Z
(to distinguish between functions).
(The specified grammar is incomplete in that it does not specify the syntax of functions but the code suggests it is lower-case characters in a subset of trig and log functions.)
So, grammar is updated as:
QUESTION
I know that one type of link you can have is to a place within the page, using .
I recently came across a carrd website, where clicking any of the links appears to take you to a new page, but I believe it's actually linking to an id (after clicking a link, you can see it's /#id rather than /someaddress). But it appears like a separate page, because you can't just scroll back up like usual when you use a link to an id.
I'm wondering how this works; I haven't been able to find anything on Google.
(edit: grammar)
...ANSWER
Answered 2021-Jun-07 at 02:24They're listening for the navigation to the bookmark in JS. From there, they dynamically set the display value on the other section tags on the page. In other words, this is done with JS and isn't a normal function of the bookmarks feature. If you want to do this, you'd need to make a JS script that hides the rest of the page when you jump to a bookmark.
QUESTION
I have a scannerless parser grammar utilizing the CharsAsTokens faux lexer which generates a usable Java Parser class for ANTLR4 versions through 4.6. But when updating to ANTLR 4.7.2 through 4.9.3-SNAPSHOT, the tool generates code producing dozens of compilation errors from the same grammar file, as detailed below.
My question here is simply: Are scannerless parser grammars no longer supported, or must their character-based terminals be specified differently in 4.7 and beyond?
Update:
Unfortunately, I cannot post my complete grammar here as it is derived from FOUO security marking guidance, access to which is retricted by the U.S. government (I am a DoD/IC contractor).
The incompatible upgrade issue however is entirely reproducible with the CSQL.g4 scannerless parser grammar example referred to by Ter in Section 5.6 of The Definitive ANTLR 4 Reference.
As does my grammar, the CSQL example uses CharsAsTokens.java for its tokenizer, and CharVocab.tokens as its token vocabulary.
Note that every token name is specified by its ASCII character-literal equivalent, as in:
...ANSWER
Answered 2021-Jun-07 at 00:17Try defining a GrammarLexer.g4 file instead of the GrammarLexer.tokens file. (You'd still using the options: { tokenVocab = GrammarLexer; }
like you do if you create the GrammarLexer.tokens file} It could be as simple as:
QUESTION
I am stuck to a problem from the famous dragon Book of Compiler Design.How to find all the viable prefixes of the following grammar:
...ANSWER
Answered 2021-Jun-06 at 22:140n1n
is not a regular language; regexen don't have variables like n
and they cannot enforce an equal number of repetitions of two distinct subsequences. Nonetheless, for any context-free grammar, the set of viable prefixes is a regular language. (A proof of this fact, in some form, appears at the beginning of Part II of Donald Knuth's seminal 1965 paper, On the Translation of Languages from Left to Right, which demonstrated both a test for the LR(k) property and an algorithm for parsing LR(k) grammars in linear time.)
OK, to the actual question. A viable prefix for a grammar is (by definition) the prefix of a sentential form which can appear on the stack during a parse using that grammar. It's called "viable" (which means "still alive" or "could continue growing") precisely because it must be the prefix of some right sentential form whose suffix contains no non-terminal symbol. In other words, there exists a sequence of terminals which can be appended to the viable prefix in order to produce a right-sentential form; the viable prefix can grow.
Knuth shows how to create a DFA which produces all viable prefixes, but it's easier to see this DFA if we already have the LR(k) parser produced by an LR(k) algorithm. That parser is a finite-state machine whose alphabet is the set of terminal and non-terminal symbols of a grammar. To get the viable-prefix grammar, we use exactly the same state machine, but we remove the stack (so that it becomes just a state machine) and the reduce actions, leaving only the shift and goto actions as transitions. All states in the viable-prefix machine are accepting states, since any prefix of a viable prefix is itself a viable prefix.
A key feature of this new automaton is that it cannot extend a prefix with a reduce action (since we removed all the reduce actions). A prefix with a reduce action is a prefix which ends in a handle -- recall that a handle is the right-hand side of some production -- so another definition of a viable prefix is that it is a right-sentential form (that is, a possible step in a derivation) which does not extend beyond the right-most handle.
The grammar you are working with has only two productions, so there are only two handles, 01
and 0S1
. Note that 10
and 1S
cannot be subsequences of any right-sentential form, nor can a right-sentential form contain more than one S
. Any right-sentential form must either be a sentence 0n1n
or a sentential form 0nS1n
where n>0
. But every handle ends at the first 1
of a sentential form, and so a viable prefix must end at or before the first 1
. This produces precisely the four possibilities you list, which we can condense to the regular expression 0*0(S1?)?
.
Chopping off the suffix removed the second n
from the formula, so there is no longer a requirement of concordance and the language is regular.
Questions like this and their answers are begging to be rendered using MathJax. StackOverflow, unfortunately, does not provide this extension, which is apparently considered unnecessary for programming. However, there is a site in the StackExchange constellation dedicated to computing science questions, http://cs.stackexchange.com, and another one dedicated to mathematical questions, http://math.stackexchange.com. Formal language theory is part of both computing science and mathematics. Both of those sites permit MathJax, and questions on those sites will not be closed because they are not programming questions. I suggest you take this information into account for questions like this one.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install grammar
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page