Support
Quality
Security
License
Reuse
kandi has reviewed antlr4 and discovered the below as its top functions. This is intended to give you an instant insight into antlr4 implemented functionality, and help decide if they suit your requirements.
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
Parse a string using ANTLR4
// I've placed your grammar in a file called T.g4 (hence the name `TLexer`)
String source = "(CHGA/B234A/B231";
TLexer lexer = new TLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
TLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
PARENTHESIS `(`
LETTER4 `CHGA`
SLASH `/`
LETTER4 `B`
DIGIT3 `234`
LETTER4 `A`
SLASH `/`
LETTER4 `B`
DIGIT3 `231`
EOF `<EOF>`
tipo3 : designador idmensaje? idmensaje?;
designador : PARENTHESIS CHG;
idmensaje : letter4 SLASH letter4 DIGIT3;
letter4 : LETTER LETTER? LETTER? LETTER?
| CHG
;
CHG : 'CHG' ;
LETTER : [a-zA-Z] ;
SLASH : '/';
PARENTHESIS : '(';
DIGIT3 : DIGIT DIGIT DIGIT;
fragment DIGIT : [0-9];
-----------------------
// I've placed your grammar in a file called T.g4 (hence the name `TLexer`)
String source = "(CHGA/B234A/B231";
TLexer lexer = new TLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
TLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
PARENTHESIS `(`
LETTER4 `CHGA`
SLASH `/`
LETTER4 `B`
DIGIT3 `234`
LETTER4 `A`
SLASH `/`
LETTER4 `B`
DIGIT3 `231`
EOF `<EOF>`
tipo3 : designador idmensaje? idmensaje?;
designador : PARENTHESIS CHG;
idmensaje : letter4 SLASH letter4 DIGIT3;
letter4 : LETTER LETTER? LETTER? LETTER?
| CHG
;
CHG : 'CHG' ;
LETTER : [a-zA-Z] ;
SLASH : '/';
PARENTHESIS : '(';
DIGIT3 : DIGIT DIGIT DIGIT;
fragment DIGIT : [0-9];
-----------------------
// I've placed your grammar in a file called T.g4 (hence the name `TLexer`)
String source = "(CHGA/B234A/B231";
TLexer lexer = new TLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
TLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
PARENTHESIS `(`
LETTER4 `CHGA`
SLASH `/`
LETTER4 `B`
DIGIT3 `234`
LETTER4 `A`
SLASH `/`
LETTER4 `B`
DIGIT3 `231`
EOF `<EOF>`
tipo3 : designador idmensaje? idmensaje?;
designador : PARENTHESIS CHG;
idmensaje : letter4 SLASH letter4 DIGIT3;
letter4 : LETTER LETTER? LETTER? LETTER?
| CHG
;
CHG : 'CHG' ;
LETTER : [a-zA-Z] ;
SLASH : '/';
PARENTHESIS : '(';
DIGIT3 : DIGIT DIGIT DIGIT;
fragment DIGIT : [0-9];
Antlr4 saying my rule is mutually left recursive with itself
expr returns [AstExpr ast]
: ...
| ( e1=expr O_OKLEPAJ e2=expr O_ZAKLEPAJ {$ast = new AstArrExpr(loc($e1.ast, $O_ZAKLEPAJ), $e1.ast, $e2.ast);} )
;
expr returns [AstExpr ast]
: ...
| e1=expr O_OKLEPAJ e2=expr O_ZAKLEPAJ {$ast = new AstArrExpr(loc($e1.ast, $O_ZAKLEPAJ), $e1.ast, $e2.ast);}
;
-----------------------
expr returns [AstExpr ast]
: ...
| ( e1=expr O_OKLEPAJ e2=expr O_ZAKLEPAJ {$ast = new AstArrExpr(loc($e1.ast, $O_ZAKLEPAJ), $e1.ast, $e2.ast);} )
;
expr returns [AstExpr ast]
: ...
| e1=expr O_OKLEPAJ e2=expr O_ZAKLEPAJ {$ast = new AstArrExpr(loc($e1.ast, $O_ZAKLEPAJ), $e1.ast, $e2.ast);}
;
How to solve ANTLR error "Attribute references not allowed in lexer actions"
INT DOT [a-z]+ {System.out.println($INT.text);}
some_var_name=INT DOT [a-z]+ {System.out.println($some_var_name.text);}
LINE
: INT DOT [a-z]+ {System.out.println(getText());}
;
-----------------------
INT DOT [a-z]+ {System.out.println($INT.text);}
some_var_name=INT DOT [a-z]+ {System.out.println($some_var_name.text);}
LINE
: INT DOT [a-z]+ {System.out.println(getText());}
;
-----------------------
INT DOT [a-z]+ {System.out.println($INT.text);}
some_var_name=INT DOT [a-z]+ {System.out.println($some_var_name.text);}
LINE
: INT DOT [a-z]+ {System.out.println(getText());}
;
Spring Boot Logging to a File
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.formatter = org.springframework.boot.logging.java.SimpleFormatter
-----------------------
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.formatter = org.springframework.boot.logging.java.SimpleFormatter
Is there a way to extract tokens in order with ANTLR?
const string query = "1 + 2";
var inputStream = new AntlrInputStream(query);
var lexer = new StatsQueryLexer(inputStream);
var tokenStream = new CommonTokenStream(lexer);
tokenStream.Fill();
var parser = new StatsQueryParser(tokenStream)
{
BuildParseTree = true
};
Console.WriteLine($"Parse tree: {parser.root().ToStringTree(parser)}");
Console.WriteLine("\nTokens:");
foreach (var token in tokenStream.GetTokens())
{
Console.WriteLine($" {StatsQueryLexer.DefaultVocabulary.GetSymbolicName(token.Type), -15} '{token.Text}'");
}
Parse tree: (root (expression (expression 1) + (expression 2)) <EOF>)
Tokens:
NUMBER '1'
OPERATOR '+'
NUMBER '2'
EOF '<EOF>'
-----------------------
const string query = "1 + 2";
var inputStream = new AntlrInputStream(query);
var lexer = new StatsQueryLexer(inputStream);
var tokenStream = new CommonTokenStream(lexer);
tokenStream.Fill();
var parser = new StatsQueryParser(tokenStream)
{
BuildParseTree = true
};
Console.WriteLine($"Parse tree: {parser.root().ToStringTree(parser)}");
Console.WriteLine("\nTokens:");
foreach (var token in tokenStream.GetTokens())
{
Console.WriteLine($" {StatsQueryLexer.DefaultVocabulary.GetSymbolicName(token.Type), -15} '{token.Text}'");
}
Parse tree: (root (expression (expression 1) + (expression 2)) <EOF>)
Tokens:
NUMBER '1'
OPERATOR '+'
NUMBER '2'
EOF '<EOF>'
ANTLR4: How to hide a specific character?
grammar Rewrite;
everything: .* EOF;
Int: [0-9_]+ { Text = Text.Replace("_",""); };
WS: [ \n\r\t]+ -> skip;
ANTLR: how to debug a misidentified token
parse
: command* EOF
;
command
: (ifStatement | variable)+
;
ifStatement
: IF ANSWERED '(' variable ')' command* END IF
;
variable
: TEXT
;
IF : 'IF';
END : 'END';
ANSWERED : 'ANSWERED';
TEXT : [a-zA-Z0-9]+;
SPACES : [ \t\r\n]+ -> skip;
Match at least one element in sequence of optional elements (ANTLR)
expr
: a b? c?
| b c?
| c
;
ANTLR4 - How to match something until two characters match?
TEXT
: (
~[<[$=/'_^~]
| '<' ~'<'
| '=' ~'='
| '/' ~'/'
| '\'' ~'\''
| '_' ~'_'
| '~' ~'~'
| '^' ~'^'
)+
;
[@0,0:1='^^',<'^^'>,1:0]
[@1,2:14='subscript=^^\n',<TEXT>,1:2]
[@2,15:14='<EOF>',<EOF>,2:0]
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| plaintext # unstyledStatement
;
plaintext: TEXT+;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
TEXT: .;
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| (U_TEXT | TEXT)+ # unstyledStatement
;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
U_TEXT: ~[=/'_~^]+;
TEXT: .;
-----------------------
TEXT
: (
~[<[$=/'_^~]
| '<' ~'<'
| '=' ~'='
| '/' ~'/'
| '\'' ~'\''
| '_' ~'_'
| '~' ~'~'
| '^' ~'^'
)+
;
[@0,0:1='^^',<'^^'>,1:0]
[@1,2:14='subscript=^^\n',<TEXT>,1:2]
[@2,15:14='<EOF>',<EOF>,2:0]
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| plaintext # unstyledStatement
;
plaintext: TEXT+;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
TEXT: .;
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| (U_TEXT | TEXT)+ # unstyledStatement
;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
U_TEXT: ~[=/'_~^]+;
TEXT: .;
-----------------------
TEXT
: (
~[<[$=/'_^~]
| '<' ~'<'
| '=' ~'='
| '/' ~'/'
| '\'' ~'\''
| '_' ~'_'
| '~' ~'~'
| '^' ~'^'
)+
;
[@0,0:1='^^',<'^^'>,1:0]
[@1,2:14='subscript=^^\n',<TEXT>,1:2]
[@2,15:14='<EOF>',<EOF>,2:0]
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| plaintext # unstyledStatement
;
plaintext: TEXT+;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
TEXT: .;
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| (U_TEXT | TEXT)+ # unstyledStatement
;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
U_TEXT: ~[=/'_~^]+;
TEXT: .;
-----------------------
TEXT
: (
~[<[$=/'_^~]
| '<' ~'<'
| '=' ~'='
| '/' ~'/'
| '\'' ~'\''
| '_' ~'_'
| '~' ~'~'
| '^' ~'^'
)+
;
[@0,0:1='^^',<'^^'>,1:0]
[@1,2:14='subscript=^^\n',<TEXT>,1:2]
[@2,15:14='<EOF>',<EOF>,2:0]
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| plaintext # unstyledStatement
;
plaintext: TEXT+;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
TEXT: .;
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| (U_TEXT | TEXT)+ # unstyledStatement
;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
U_TEXT: ~[=/'_~^]+;
TEXT: .;
Is there a way to know which alternative rule ANTLR parser is currently in?
parent
: child_a
| child_b
| child_c
| child_d
| child_e
;
child_d
: ... // in a listener, you can now get index 3 from the parent context
;
parent
: child_a
| child_b
| child_c
| child_d // get index 3 here
| child_e
;
-----------------------
parent
: child_a
| child_b
| child_c
| child_d
| child_e
;
child_d
: ... // in a listener, you can now get index 3 from the parent context
;
parent
: child_a
| child_b
| child_c
| child_d // get index 3 here
| child_e
;
QUESTION
Parse a string using ANTLR4
Asked 2022-Mar-24 at 15:56Example: (CHGA/B234A/B231
String:
a) Designator: 3 LETTERS
b) Message number (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
c) Reference data (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
Result:
CHG
A/B234
A/B231
In grammar file:
/*
* Parser Rules
*/
tipo3: designador idmensaje? idmensaje?;
designador: PARENTHESIS CHG;
idmensaje: LETTER4 SLASH LETTER4 DIGIT3;
/*
* Lexer Rules
*/
CHG : 'CHG' ;
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
SLASH : '/' ;
PARENTHESIS : '(' ;
DIGIT3 : DIGIT DIGIT DIGIT ;
LETTER4 : LETTER LETTER? LETTER? LETTER? ;
But when testing the tipo3
rule its giving me the following message:
line 1:1 missing 'CHG' at 'CHGA'
How can i parse that string in antlr4?
ANSWER
Answered 2022-Mar-24 at 15:56When you're confused why a certain parser rule is not being matched, always start with the lexer. Dump what tokens your lexer is producing on the stdout. Here's how you can do that:
// I've placed your grammar in a file called T.g4 (hence the name `TLexer`)
String source = "(CHGA/B234A/B231";
TLexer lexer = new TLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
TLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
If you runt the Java code above, this will be printed:
PARENTHESIS `(`
LETTER4 `CHGA`
SLASH `/`
LETTER4 `B`
DIGIT3 `234`
LETTER4 `A`
SLASH `/`
LETTER4 `B`
DIGIT3 `231`
EOF `<EOF>`
As you can see, CHGA
becomes a single LETTER4
, not a CHG
+ LETTER4
token. Try changing LETTER4
into LETTER4 : LETTER;
and re-test. Now you'll get the expected result.
In your current grammar CHGA
will always become a single LETTER4
. This is just how ANTLR works (the lexer tries to consume as many chars for a single rule as possible). You cannot change this.
What you could do, it move the construction of the multi-letter rule to the parser instead of the lexer:
tipo3 : designador idmensaje? idmensaje?;
designador : PARENTHESIS CHG;
idmensaje : letter4 SLASH letter4 DIGIT3;
letter4 : LETTER LETTER? LETTER? LETTER?
| CHG
;
CHG : 'CHG' ;
LETTER : [a-zA-Z] ;
SLASH : '/';
PARENTHESIS : '(';
DIGIT3 : DIGIT DIGIT DIGIT;
fragment DIGIT : [0-9];
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
Save this library and start creating your kit