Skip to main content

VisualLangLab - Grammar without Tears

by Sanjay Dasgupta

VisualLangLab is "Project of the week" on Java Posse for 24th Sept!

This is a periodically revised version of an article published on java.net on 14th September 2011. These revisions (last on 09-MAR-2012 for Version-10.25) keep the contents compatible with the current version of VisualLangLab. A history of the revisions can be seen here.

In the world of computing a grammar is a somewhat different thing from the object implied in the phrase grammar without tears. But in terms of the misery caused to those who have to deal with them, the two grammars appear to be closely related. This article describes a no tears approach to parser development using VisualLangLab, a free open-source parser-generator. VisualLangLab has an IDE that represents grammar rules (or productions) as intuitive trees, like those in Figure-1 below, without code or scripts of any kind.

VisualLangLab's grammar-trees
Figure 1. VisualLangLab's grammar-trees

The grammar-trees are also executable, and can be run directly at the click of a button. This encourages the use of tight iterative-incremental development cycles, and greatly improves the pace of development. These features also make it an effective prototyping environment and a training tool.

Parsing techniques and parser-generator tools are a great addition to any developer's arsenal, and VisualLangLab provides a convenient, gentle introduction to those topics. A later article will describe the use of VisualLangLab to produce a domain specific language or DSL for testing Java-Swing programs.

Can I see the Generated Code?

As a now-famous panda discovered, powerful recipes sometimes have no secret ingredient. And there is no generated code.

VisualLangLab uses parser combinator functions to turn grammar-trees (or XML from a saved grammar-file) directly into a parser at run-time without producing or compiling source-code. But users of the GUI and the API do not have to know anything about combinators to use these capabilities.

Download and Run VisualLangLab

The VisualLangLab web-site has a JAR file VLL4J.jar that includes everything you need. The only other prerequisite is a 1.6+ JRE. To start the VisualLangLab GUI, just double-click the file VLL4J.jar. Users on Linux, Mac OS, or other UNIX-like systems may first have to enable execution (chmod +x VLL4J.jar).

The GUI

When started, VisualLangLab displays the GUI shown in Figure-2 below. The article explains the menus and buttons as needed, but a full description can also be found online at The GUI and in the download zip. All toolbar buttons have tool-tip texts that explain their use.

The VisualLangLab GUI
Figure 2. The VisualLangLab GUI

The graphical and text panels are used as described below.

  • A is used for the grammar-tree as described in Managing Rules below
  • B displays the AST structure of the selected grammar-tree node
  • C is where the selected node's action code is displayed and edited. If this appears to break the no code, no script promise, rest assured that action-code is always optional
  • D and E are used for testing the parser as described below

The following sections are a tutorial introduction that lead you through the steps of creating a simple parser.

Managing Tokens

There are two kinds of token, literal and regex, that the following discussion and examples will help you differentiate. We create 2 literals and 1 regex that are used in a rule later.

Literal Token Creation

To create a literal token select Tokens -> New literal from the main menu as in Figure-3 below. Enter the literal's name (PLUS), a space, and the pattern (+) into the dialog box and click the OK button. A token's name is used to refer to it from rules, while the pattern describes its contents. All instances of a particular literal token (during the parser's run) have the same fixed content.

Now create another literal token named MINUS with a - pattern (as in the second dialog box in Figure-3).

Creating a literal token
Figure 3. Creating a literal token

Regex Token Creation

Figure-4 below shows how you can create a regex token. Select Tokens -> New regex from the main menu, and enter the token's name (NUMBER), a space, and the pattern (\\d+) into the dialog box and click OK. You probably recognize the pattern as a Java regular-expression that matches numbers.

Creating a regex token
Figure 4. Creating a regex token

Observe that the pattern part in the dialogs above (for literal as well as regex tokens) should be written exactly as if they were inside a String in a Java program (without the surrounding quote marks).

There is not a great deal more to tokens, but if you would like to read the fine print, check out the last part of Editing the Grammar Tree.

Miscellaneous Token Operations

The main menu and toolbar also support several other operations. You can find which rules use any particular token (Tokens -> Find token), edit tokens (Tokens -> Edit token), and delete unused tokens (Tokens -> Delete token).

Token Libraries

Tokens tend to be reused within application domains, so VisualLangLab allows you to create and use token libraries. These operations are invoked from the main menu by selecting Tokens -> Import tokens and Tokens -> Export tokens, or by using corresponding toolbar buttons.

Whitespace and Comments

You can specify the character patterns that separate adjacent tokens by invoking Globals -> Whitespace from the main menu, and entering a regular expression into the popped up dialog box. The default whitespace specification is "\\s+".

You can also provide a regular expression for recognizing comments in the input text. Select Globals -> Comment from the main menu, and enter a regular expression into the dialog box. There is no default value for this parameter.

Managing Rules

VisualLangLab represents rules as grammar-trees with distinctive icons (as in Figure-1 above) and a context-sensitive popup-menu. This graphical depiction makes grammars comprehensible to a wider range of users. The icons and textual annotations used in the grammar-trees are described below.

Node Icons

The table below lists the icons from which grammar-trees are constructed.

Non-terminals
The Root iconRoot - used for the root node of every grammar tree
The Choice iconChoice - used as the parent of a group of alternative items (any one of which occurs in the input)
The Sequence iconSequence - used as the parent of a sequence of items which occur in the order specified
The RepSep iconRepSep - parent of a sequence of similar items that also uses a specified separator
The Reference iconReference - invokes another named parser
The WildCard iconToken wildcard - a pseudo-token that matches any other defined token, and is useful for implementing syntax-error handling strategies
The SemPred iconSemantic predicate - succeeds or fails depending on the run-time value of an expression
Terminals
The Literal iconLiteral - matches a specified literal token
The Regex iconRegexp - matches a specified regex token
Icon overlays
The Commit-mark iconCommit - displayed on top of a node that has the commit annotation
The Error-mark iconError: indicates an error in the associated node or rule

Node Annotations

Each grammar-tree node has characteristics (such as multiplicity) that are represented as the node's annotations, and are displayed as text beside each node's icon. You can change a node's annotations by right-clicking the node and choosing the required settings from the context-menu as in Figure-5 below.

Setting node annotations
Figure 5. Setting node annotations

The first annotation is a 1-character flag that indicates the node's multiplicity -- the number of times the corresponding entity may occur in the parser's input. You can see examples of its use everywhere in the built-in Sample Grammars. Multiplicity has one of the following values:

  • 1 - exactly one occurrence
  • ? - 0 or 1 occurrence
  • * - 0 or more occurrences
  • + - 1 or more occurrences
  • 0 - the associated entity must not occur in the input (but see note below)
  • = - the associated entity must occur in the input (but see note below)

Note: The last two values ("0" and "=") are used to implement syntactic predictes and have no influence on the information gathered by the parser (into to AST or parse-tree). The names not and guard are inspired by functions of the same name in the Scala Parser combinator library class.

The second annotation is the name of the entity. The value displayed depends on the type of the node as described below. (The remaining icon types do not have a name)

  • Root - the name of the parser-rule itself
  • Literal - the name of the literal token
  • Regexp - the name of the regular-expression token
  • Reference - the name of referred-to parser-rule

All other annotations, described below, are optional. If any of the optional annotations are present, they are enclosed within square brackets.

  • commit - backtracking to optional parser clauses (at an upper level) will be prevented if this node is successfully parsed
  • description - an optional user-assigned string (see below) that can be assigned to certain types of node
  • drop - the node will not be entered into the AST. You can see examples of its use in the built-in ArithExpr Sample Grammars
  • message - the node has an associated error-message
  • packrat - the parser-rule is a packrat parser (applicable only to a root-node)
  • trace - the parser's use of the node will be logged at run-time

All node attributes can be changed via the context menu shown in Figure-5 above.

Finally, if the node has a description, it is displayed last within parenthesis.

Creating Rules

The grammar-tree popup menu is the tool used for creating and editing grammar-trees, and is described fully in Editing the Grammar Tree. In the following example we get our feet just a little wet by composing a simple rule with the tokens we created above.

First, add a Sequence node to the grammar-tree by right-clicking the root node (The Root icon) and selecting Add -> Sequence from the popup menu as shown on the left side of Figure-6 below. A sequence icon (The Sequence icon) is added to the root, as on the right of the figure.

Adding a sequence node
Figure 6. Adding a sequence node

Then perform the following steps:

  • right-click the newly added sequence node (The Sequence icon) and select Add -> Token. This will bring up a dialog containing a list of token names. Select NUMBER and click the dialog's OK button. A regex icon (The Regex icon) is added to the sequence node
  • right-click the sequence node again and select Add -> Choice from the popup menu. This should add a Choice node icon (The Choice icon) to the sequence node
  • right-click the newly created choice node (The Choice icon) and select Add -> Token. Select PLUS in the dialog box and click OK. A literal icon (The Literal icon) is added to the choice node. Repeat this action once more, and add the MUNUS token to the choice node
  • repeat the first step above to add another NUMBER to the sequence node

You're done! If your parser does not look like the one in Figure-7 below, use Edit from the grammar-tree's context menu to make the required changes.

Your first visual parser
Figure 7. Your first visual parser

The text displayed in the panel to the right of the grammar-tree is the AST of the selected node, and so depends on which icon you clicked last.

Miscellaneous Rule Operations

The main menu and toolbar also support several other operations. You can find which other rules refer any particular rule (Rules -> Find rule), rename rules (Rules -> Rename rule), and delete unused rules (Rules -> Delete rule).

Saving the Grammar

A grammar can be saved to a file by invoking File -> Save from the main menu. Grammars are stored in XML files with a .vll suffix. The contained XML captures the structure of the rules, the token definitions, and other details, but no generated code of any kind. The XML is quite intuitive and you can use XSLT or a similar technology to transform it into another format (a grammar for another tool, or code of some sort, for example) if required.

A saved grammar can be read back into the GUI by invoking File -> Open from the main menu. This is useful for review, further editing, or testing. The API can also load a saved grammar, and regenerate the parser for use from a client program.

Testing your Parser

Testing is really simple. Key in the test input under Parser Test Input (as at "A" in Figure-8 below), click the Parse input button (under the red rectangle), and validate the output that appears under Parser Log (at "C" in the figure). You don't have to write any code, use any other tools, or do anything else.

Testing your parser
Figure 8. Testing your parser

The figure shows the result of testing the parser with "3 + 5" as the input. The Parser Log are should contain the following text:

    Generating parsers ... (10 ms)
    Parsing ... (3 chars in 0 ms), result follows:
    Array(3, Pair(0, +), 5)

The first two lines contain performance information that is safely ignored. The last line (underlined) is the parser's result. The result is an AST with a predefined structure shown under Parse Tree (AST) Structure. Since the test input entered was "3 + 5", we know that the result is correct. However, real-life parsers are too complex for manual testing, so VisualLangLab supports several approaches to automated testing that are described online in Testing Parsers.

That brings us to the end of this quick example. If you feel that the result of parsing "3 + 5" should be 8 instead of Array(3,Pair(0,+),5) check out the section ArithExpr with action-code in Sample Grammars.

The Parse-Tree (or AST)

The terms parse-tree and Abstract Syntax Tree (or just AST) are used interchangeably to mean the structure of information gathered during the parsing process. VisualLangLab displays the AST of the selected grammar-tree node in the text area under Parse Tree (AST) Structure as seen in Figure-7 above. ASTs are constructed from mutually nested instances of standard Java types. Examples and more details can be found online at AST and Action Code or in the downloaded zip.

Action Code

Action-code (or just actions) are Javascript functions associated with grammar-tree nodes, and entered into the text area under Action Code ("C" in Figure-2 above). It is never necessary to have action code embedded in the grammar — you can always remove all code into an application program that invokes the parser via the API, and then processes the AST returned by it. You can see examples of action-code in the ArithExpr with action-code sample grammar, and more details can be found online at AST and Action Code or in the downloaded zip.

Using the API

The VisualLangLab API enables applications written in any JVM language to use parsers created with the GUI. The API is very small, and contains the types and functions required to perform the following operations.

  • load a parser from a saved grammar-file
  • parse a string using the parser
  • test the result, and retrieve the AST or error information

More details and examples can be found online at Using the API.

Sample Grammars

To enable users to quickly gain hands-on experience with VisualLangLab grammars, the tool contains some built-in sample grammars. These samples can be reviewed, tested, modified, and saved just like any other grammar created from scratch. To open a sample grammar select Help -> Sample grammars from the main menu, and choose one of the samples shown as in Figure-9 below.

Sample grammars available
Figure 9. Sample grammars available

More information about these samples can be found online at Sample Grammars or in the downloaded zip file.

Conclusion

The article introduces readers to parser development using the completely visual tool VisualLangLab. Its features make it an effective prototyping environment and a training tool, and will hopefully be a useful addition to any developer's skills.

Resources (or References)

Revision History
  • 09-MAR-2012 - Updated to be compatible with version 10.25 (the pure Java version).
  • 04-DEC-2011 - Version 7.01 change warnings added under Regex Token Creation and Literal Token Creation. The new (from 7.01) WildCard pseudo-token is described in the table under Node Icons.
  • 25-SEP-2011 - The procedure for running VisualLangLab under Have a Scala Installation has been changed to note that SCALA_HOME must be accessible.
  • 15-SEP-2011 - Changed Figures 2, 4, 5, 7, and 8 to show the new infinite AST depth option.
  • 02-SEP-2011 - (before publication on java.net): Figure-10 under Differences from Scala's Combinators changed.
  •  
     
    Close
    loading
    Please Confirm
    Close