This tutorial uses VisualLangLab to explore the scenarios and examples in chapter 3 (A Quick Tour for the Impatient) of the book The Definitive ANTLR Reference. The grammars used can be found online at this link: http://pragprog.com/titles/tpantlr/source_code
|
For Ver-10.10 or higher only! If you are using an older version, follow this tutorial instead. Beginning Ver-10.01, the title bar of the About VisualLangLab dialog box displays the version number. The latest jar file can be downloaded here: VLL4J.jar. |
This tutorial is not a feature comparison of the two tools. It merely uses a familiar scenario (for many parsing-tool users) to illustrate the use of VisualLangLab. The suitability of VisualLangLab for any particular application must be evaluated from the documentation and examples at the VisualLangLab website.
The following list describes some essential jargon.
The only thing you need to follow this tutorial is the VisualLangLab JAR file VLL4J.jar. VisualLangLab is started by double-clicking VLL4J.jar. No other software or tool is needed.
This part of the tutorial describes VisualLangLab equivalents of section 3.1 (titled Recognizing Language Syntax) of The Definitive ANTLR Reference. The description below shows how to create the parser from scratch, but you can also inspect the complete parser by just invoking "Help" -> "Sample grammars" -> "TDAR-Expr" from the main menu as shown on the right side of Figure-1 below.

Figure-1. Renaming "Main" to "prog"
VisualLangLab creates a parser-rule (or grammar-tree) called Main at startup.
You can create the grammar-rule prog by just renaming Main.
Click the toolbar's Rename rule
button (
) and enter prog into the dialog presented (lower left of Figure-1).
The rule's name will be changed in the toolbar's dropdown list as well
at the rule-tree's root icon (see Figure-2 below).
The parser-rule prog is defined as follows.
prog: stat+ ;
So we must create the grammar-rule stat before we can complete prog.
Click the New rule button (
),
and enter stat into the dialog presented (left of Figure-2).
After the new rule has been created, click the back button
or select stat from the toolbar's drop-down list
(see right side of Figure-2) to return to prog.

Figure-2. Creating rule 'stat'
Right-click the root-node of prog, and select Reference
(left of Figure-3 below). Select stat from the dropdown list of the
dialog presented, and click the OK button. The rule should now look
like that on the right side of Figure-3. Right-click the newly added
reference node (
), and select "Multiplicity" -> "+ (1 or more)" from the context menu.
A "+" sign next to the reference icon indicates that the multiplicity has been changed
(red circled at bottom right of Figure-3).

Figure-3. Populating rule 'prog'
The grammar-rule stat was already created above, and we populate the rule here using its complete definition shown below.
stat: expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;
At the topmost level, the rule is a choice of three structures (or sub-rules).
Right-click the root-node (
), and select "Add" -> "Choice"
from the context menu (as at the top left of Figure-4 below). This should add
a choice icon (
)
to the root node (bottom left of Figure-4).
Now right-click the newly created choice node (
), and select "Add" -> "Sequence"
from the context menu (as at the top right of Figure-4). This should add
a sequence icon (
) to the choice node. Repeat this action once more, so you have two
sequence icons attached to the choice node (bottom right of Figure-4).

Figure-4. Populating rule 'stat'
The little red "x" marks at the top-right of some icons is an error indicator. The tool-tip text (displayed when the mouse hovers over an icon) tells you what the problem is.
Before you can proceed further, you need to create the entities (tokens and rules) that
stat uses. First create another parser-rule named expr following the
instructions used to create stat described around Figure-2 above.
Use the toolbar's dropdown list or the Back button (
) as described before to display
the stat parser-rule again (as in bottom right of Figure-4 above).
Next, create the tokens needed. Click the New regex toolbar button
(
),
enter "NEWLINE \\r?\\n" into the dialog presented
(right of Figure-5 below), and click the OK button.
The text entered into the dialog contains the token's name and regular-expression
pattern separated by one or more spaces. The name is used to refer to the token in
parser-rules, while the pattern describes the input text that it may match.
The pattern provided follows the same rules as Java Patterns,
and must be written as if in a String in a Java program (hence the double backslash).
Now, create two more regex tokens (named INT and ID)
by following the same procedure, but entering the
following texts into the dialog: "INT \\d+" and
"ID [a-zA-Z]+". These are not shown in Figure-5.

Figure-5. Creating tokens for 'stat'
Then create a literal token by clicking the New literal toolbar button
(
),
enter "EQUALS =" into the dialog (left of Figure-5 above),
and click the OK button. The pattern used for a literal token is just a literal,
not a regular-expression.
Important: The original PEG script does not have a token definition for "=" as it is used directly as an inline-defined literal token. VisualLangLab does not allow such inline anonymous token definitions, so all tokens must be named and defined.
You can now complete the stat parser-rule. Right-click the first sequence icon, and select "Add" -> "Reference" from the context menu. Select rule expr in the dialog presented, and click the OK button. Right-click the same sequence icon again and select "Add" -> "Token" from the context menu. Select token NEWLINE in the dialog presented, and click the OK button. Your parser-rule should now look like the one on the right side of Figure-6 below.

Figure-6. Completing (1) rule 'stat'
Now select the second sequence icon, and add the following entities to it: The token ID, the token EQUALS, the reference expr, and the token NEWLINE. Now right-click the choice icon (just below the root), and add the token NEWLINE. Your parser-rule should now look like the one on the left side of Figure-7 below. This parser-rule is structurally similar to the original ANTLR parser-rule, although its behavior may differ somewhat in certain situations. Alternative sub-rules in VisualLangLab should ideally be arranged with the most specific ones near the top. So click the lower (second) sequence icon and drag it upwards above the upper (first) sequence icon. Your parser-rule should now look like the one on the right side of Figure-7. This is the complete stat rule.

Figure-7. Completing (2) rule 'stat'
The rule expr has already been created above, so you only have to populate the logic. The rule's EBNF specification is shown below.
expr: multExpr (('+' |'-' ) multExpr)* ;
Most things you need to know about parser-rule editing has been covered above, so we won't go into the details except to note a few prerequisites you need to complete before proceeding: create the parser-rule multExpr, and the literal tokens PLUS (with pattern "+") and MINUS (with pattern "-"). The original PEG script does not have these token definitions as it uses them as anonymous inline tokens. When you've finished adding all elements to the parser-rule, it should look like the one on the left of Figure-8 below.

Figure-8. Completing rule 'expr'
The one last thing you need to do is set the multiplicity of the inner sequence icon to "*" (zero or more). Right-click the inner sequence icon, and select "Multiplicity" -> "*(0 or more)" from the context menu (as shown on the left). The multiplicity of the element changes from "1" to "*" (near red arrow). The finished parser-rule is on the right of Figure-8.
The rule multExpr has already been created above, so you only have to populate the logic. The rule's EBNF specification is shown below.
multExpr: atom ('*' atom)* ;
It should be obvious that the parser-rule atom, and a literal token MULT (with pattern "*") must be created before proceeding. The finished parser-rule should look like the one in Figure-9 below.

Figure-9. Completing rule 'multExpr'
The rule atom has already been created above, so you only have to populate the logic. The rule's EBNF specification is shown below.
atom: INT
| ID
| '(' expr ')'
;
There are a few prerequisites this time too: literal tokens named INT (with pattern "\\d+"), LPAREN (with pattern "(") and RPAREN (with pattern ")"). The finished parser-rule should look like the one in Figure-10 below.

Figure-10. Completing rule 'atomExpr'
In VisualLangLab whitespace is specified
as a global parameter that you can inspect and modify as shown in Figure-11 below.
Selecting "Globals" -> "Whitespace" brings up a dialog with the default whitespace
regular-expression ("\\s+"). You should change the value to
"[ \\t]+" (without the quote marks) and click the OK button.

Figure-11. Specifying whitespace
VisualLangLab does not generate any code, and you don't have to compile any code to run a parser produced by it. VisualLangLab uses a Java version of Scala parser combinators to turn the tree-models of the parser-rules directly into a parser at run-time without generating or compiling code. The intuitiveness, speed, and hassle-free nature of this approach is a huge advantage particularly when performing ad hoc testing.
A VisualLangLab grammar can be saved to a file using the "File" -> "Save" menu or corresponding toolbar buttons. The saved grammar is a XML representation of the information entered into the GUI, but has no generated information or artifacts not provided by the user. A saved grammar file can be opened using the "File" -> "Open" menu (or corresponding toolbar buttons) for review, further editing, testing, etc.
A saved grammar can also be loaded and turned into a parser by a client program via the VisualLangLab API. Client programs can be written in any language (existing or yet-to-be-invented) that runs on the JVM.
VisualLangLab brings new meaning to the terms Ad Hoc Testing and Test-Driven Development. No other parser-development tool simplifies testing so much. So, without further ado let's just perform the first test in the book:
(3+4)*5" (without quotes) into the text area under
Parser Test Input. Remember to add a newline at the endFigure-12 below highlights the elements referred above, and shows the state of the GUI after the test.

Figure-12. Performing test 1
A VisualLangLab parser without actions spews out the AST produced by the top-level parser-rule, and that is the text under Parser Test Input near the bottom of the GUI. You may assume that this output is correct as any parsing error would produce red text (as in Figure-13 below). Details about VisualLangLab's AST structure can be found in AST And Action Code.
If you see error messages or find other problems, verify the following:
(3+4)*5" has a newline character at its end
(see bottom left of Figure-12 above)[ \\t]+"
(without the quote marks)| Token Definitions | ||
|---|---|---|
| Name | Type | Pattern |
| EQUALS | Literal | = |
| ID | Regex | [a-zA-Z]+ |
| INT | Regex | \\d+ |
| LPAREN | Literal | ( |
| MINUS | Literal | - |
| MULT | Literal | * |
| NEWLINE | Regex | \\r?\\n |
| PLUS | Literal | + |
| RPAREN | Literal | ) |
The results of a few more tests are shown in Figure-13 below. In all cases, ensure that the test-input string is terminated with a newline character (the text cursor must be on the next line as in Figure-12 above). VisualLangLab's error messages are in the nature of a stack-trace, showing (from top to bottom) the nesting of rules at the time of the failure. VisualLangLab does not implement automatic error recovery, but supports mechanisms that can be used to issue application-specific error messages, and recover from errors in predictable ways.


Figure-13. Testing error cases
This part of the tutorial describes VisualLangLab equivalents of section 3.2 (titled Using Syntax to Drive Action Execution) of The Definitive ANTLR Reference. The description below shows how to add action code to the parser created above, but you can also inspect the end result by just invoking "Help" -> "Sample grammars" -> "TDAR-Expr-Actions" from the main menu. The tutorial explains the use of action code within VisualLangLab, but you can learn more details at Action Code Design.
Action code in VisualLangLab is written not as a code snippet but as an complete JavaScript function. Here are the main concepts:
The structure of the AST for any parser-rule node is displayed in the panel next to the rule-tree (under Parse Tree (AST) Structure). The following examples should clarify these ideas.
The ASTs passed to JavaScript action-code functions contain native JVM types (arrays, Lists, etc.). Users must be aware that all primitive types placed in arrays or Lists will have been auto-boxed (on the Java side) before being passed to JavaScript. The action-code functions must therefore explicitly unbox primitive types contained in arrays and Lists within ASTs.
Figure-14 below shows the rule-tree and AST for the parser-rule prog as well as certain details of the action code assignment process. The action is associated with the reference node, so that node must be selected to display the action-code function. To assign the action-code to the rule-tree created previously, proceed as follows:
If there are any syntax errors in the action-code text, a dialog box indicating the error is displayed on clicking the Save button.

Figure-14. Action for rule 'prog'
Here is the action code for 'prog' that you can copy to paste into VisualLangLab's Action Code text area.
|
In effect, this is a package of two code snippets ...
memory = {};
... which is executed before parsing begins, and ...
return "Ok";
... which is executed after parsing ends. The first snippet (memory = {};) is therefore
functionally equivalent to the line HashMap memory = new HashMap(); in the original ANTLR parser.
The second snippet (return "Ok";)
subverts VisualLangLab's default action of passing the AST up the call chain. This causes VisualLangLab to
print just the string Ok instead of the entire AST after parsing is complete.
Figure-15 below shows the rule-tree and AST for parser-rule stat as well as the JavaScript function to be used as its action code. The action-code is associated with the choice node just below the root node, so that node must be selected to display the action-code function.

Figure-15. Action for rule 'stat'
The AST in Figure-15 uses certain features that are explained below.
The function in Figure-15 actually replaces three snippets of action code (the last one being a no-op). Separate functions could have been written to handle each case too. The option chosen here has the advantage of keeping all code for each parser-rule in one function.
The text of the action-code for this parser-rule is given below.
|
If you're still wondering why the intValue() and doubleValue() calls
are required, here's the secret: Since the argument to the function is (in this case)
an array of Java Objects, all primitive Java types are autoboxed. So a Java
int and double become a java.lang.Integer and
java.lang.Double respectively, and JavaScript has to extract the original
values using intValue() and doubleValue() explicitly as it
does not know about auto unboxing.
Figure-16 below shows the rule-tree and AST for parser-rule atom as well as the JavaScript function to be used as its action code. The action-code is associated with the choice node just below the root node, so that node must be selected to display the action-code function.

Figure-16. Action for rule 'atom'
The function in Figure-16 replaces three snippets of action code. Again, separate functions could have been written and associated with each of the three sub-nodes of the choice icon, but using one function has the advantage of keeping everything in one place.
The text of the action-code function for the parser-rule is shown below.
|
Figure-17 below shows the rule-tree and AST for parser-rule multExpr as well as the function to be used as its action code. The action-code is associated with the sequence node just below the root node, so that node must be selected to display the action-code function.

Figure-17. Action for rule 'multExpr'
The text of the action-code function for the parser-rule is shown below. Observe that the keyword function has been abbreviated to just f. Any prefix of the word function may be used as an abbreviation here.
|
Figure-18 below shows the rule-tree and AST for parser-rule expr as well as the function to be used as its action code. The action-code is associated with the sequence node just below the root node, so that node must be selected to display the action-code function.

Figure-18. Action for rule 'expr'
The text of the action-code function for the parser-rule is shown below.
|
As before, you can test the augmented parser right away (see Figure-19 below). Proceed as follows.

Figure-19. Testing the action code
A grammar with embedded actions can be saved ("File" -> "Save") and opened again ("File" -> "Open") with VisualLangLab's main menu operations just like any other grammar. The code is saved as text (source code) within the XML file.
This part of the tutorial describes VisualLangLab equivalents of section 3.3 (titled Evaluating Expressions Using an AST Intermediate Form) of The Definitive ANTLR Reference.
VisualLangLab differs from most other parser-generators in that it always produces an AST. A well-defined convention determines the structure of the AST, and the user does not have to do anything more. The argument passed to an action code function (as in the examples above) is in fact an AST for the associated grammar-tree node produced using the same structuring convention.
An AST-based approach is designed for use in a client program that launches the parser, obtains the AST, and then interprets the AST or processes it into another form. VisualLangLab supports such operation via the VisualLangLab API that enables a client program to regenerate a parser from a saved grammar.
As described above, a VisualLangLab parser always produces an AST, so you do not have to do anything special for this. But the default AST often contains many tokens that the application does not need, and dropping them makes the AST compact, less sensitive to small grammar changes, and more usable. The following description uses a grammar obtained by dropping all disposable AST parts from the grammar described in Recognizing Language Syntax above.
The stat parser-rule shown in Figure-20 below is a good example of a rule with parts that may be dropped. To drop (from the AST) a part of a rule, right-click the node and select Drop from the context menu as on the left side of Figure-20. The icon of a dropped node has a slanted line drawn across its icon and the drop attribute added. The right side of Figure-20 shows the stat rule with several tokens dropped.

Figure-20. Dropping part of an AST
To inspect a version of this grammar with all disposable parts dropped, select "Help" -> "Sample grammars" -> "TDAR-Expr-AST" from the main menu.
This section shows you how to embed and use a VisualLangLab parser in an application program. To compile and run the code below proceed as follows.
The text parsed and interpreted by the program is in the first line of
main(). You can change the contents of that line to check out
the parser fully. The red and blue parts of the code below are dependent on the
VisualLangLab API. The blue parts follow
naming and usage conventions based on Scala's
parser combinators.
There is one function dedicated to handling each rule (e.g. function atomHdlr for rule atom,
function multExprHdlr for rule multExpr, etc.). So comparing the code of each
such funcion with the AST structure of the corresponding rule is easy. It is possible to
write the application program in any JVM language using principles
explained in Using the API.
import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import net.java.vll.vll4j.api.Vll4j;
public class TDARExprAST {
static float evalAtom(Object ast) {
Object pair[] = (Object[])ast;
switch ((Integer)pair[0]) {
case 0:
return Float.parseFloat((String)pair[1]);
case 1:
if (memory.containsKey(pair[1])) {
return memory.get(pair[1]);
}
else {
System.out.printf("Undefined variable: %s%n", pair[1]);
return 0;
}
case 2: return evalExpr(pair[1]);
}
return 0; /* never used, keeps compiler happy */
}
static float evalMultExpr(Object ast) {
Object arr[] = (Object[])ast;
Float res = evalAtom(arr[0]);
List<Object[]> lst = (List<Object[]>)arr[1];
for (Object pair[]: lst) {
res *= evalAtom(pair[1]);
}
return res;
}
static float evalExpr(Object ast) {
Object arr[] = (Object[])ast;
Float res = evalMultExpr(arr[0]);
List<Object[]> lst = (List<Object[]>)arr[1];
for (Object pair[]: lst) {
Object discr[] = (Object[])pair[0];
switch ((Integer)discr[0]) {
case 0: res += evalMultExpr(pair[1]); break;
case 1: res -= evalMultExpr(pair[1]); break;
}
}
return res;
}
static void evalStat(Object ast) {
Object pair[] = (Object[])ast;
switch ((Integer)pair[0]) {
case 0:
Object arr[] = (Object[])pair[1];
String id = (String)arr[0];
Float res = evalExpr(arr[1]);
memory.put(id, res);
break;
case 1:
System.out.println(evalExpr(pair[1]));
break;
case 2: /*do nothing*/
break;
}
}
static void evalProg(Object ast) {
List listOfStat = (List)ast;
for (Object stat: listOfStat) {
evalStat(stat);
}
}
static Map<String, Float> memory = new HashMap<String, Float>();
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
String input = "a=3\nb=4\n2+a*b\n";
Vll4j vll = Vll4j.fromFile(new File("TDAR-Expr-AST.vll"));
Vll4j.Parser exprParser = vll.getParserFor("Prog");
Vll4j.ParseResult parseResult = vll.parseAll(exprParser, input);
if (parseResult.successful()) {
Object ast = parseResult.get();
evalProg(ast);
} else {
System.out.println(parseResult);
}
}
}
The content of this section is a parser described on the ANTLR site wiki. The book refers to this parser at the very end of section 3.3 (see footnote on page 84). It expands on the grammar discussed above, and adds the following features:
The following discussion shows how to create an interpreter using two different approaches: processing the AST with an action attached to the top-level rule, and in an application program via the VisualLangLab API.
The grammar for this application (including the action code) is bundled with VisualLangLab. Select "Help" -> "Sample grammars" -> "TDAR-Simple-Tree-Based-Interpreter" from the main menu. The action code is associated with the reference node (labeled stat) of the top-level rule (Prog), and is displayed when the reference node is selected (as in Figure-21 below).
Note: The relative position of some of the alternative branches in some of the rules have been altered to arrange them in order of specificity. The interpreter also uses JavaScript's native numeric type for computations instead of BigDecimal (as in the original example code). The rules are otherwise unchanged.

Figure-21. Interpreter in action code
Figure-21 also shows the execution a few lines of code that include a definition and use of the factorial function.
Coming soon!