Skip to main content

VisualLangLab - Rapid Prototyping for Scala Parser Combinators

For Ver-10.10 or higher only!
If you are using an older version, follow this tutorial instead. Beginning Ver-10.01, the title bar of the About VisualLangLab dialog box displays the version number. The latest jar file can be downloaded here: VLL4J.jar.

This tutorial highlights VisualLangLab's features by recreating the Payroll External DSL described on page 240 of Programming Scala. The following figures show how it represents some rules of that parser.

amount days deductItems deductKind
The amount grammar The days grammar The deductItems grammar The deductKind grammar
Figure-1. Some VisualLangLab grammar-trees

Earlier versions (before 10) of VisualLangLab were written in Scala, and made use of its parser combinators. Although version 10 is written purely in Java it still uses Java translations of certain elements of the Scala scala.util.parsing.combinator package. A comparison between this approach and hand-coded Scala parsers can be found in Pros and Cons below.

Another tutorial that uses examples from The Definitive ANTLR Reference can be found in A Quick Tour. A simpler tutorial is available at VisualLangLab - Grammar without Tears.

Essential Jargon

  • parser (with lower-case inital p) - in VisualLangLab documentation, a parser is an entire program or function for parsing and processing input in a certain format.
  • parser-rule - the simpler units into which a parser is normally broken down. In a Scala combinator-based parser, each def ... : Parser[...] = ... is a parser-rule. Parser-rule and grammar-tree are generally synonymous terms.
  • grammar-tree - the visual tree used to represent each parser-rule in VisualLangLab. Grammar-tree and parser-rule are generally synonymous terms.
  • Parser (with upper-case inital P) - a Scala type used to implement parser-rules as well as the component nodes of parser-rules

Version 10 is written in Java, and does not use Scala parser combinators. However, it still uses Java translations of some elements of the scala.util.parsing.combinator package. In that sense, VisualLangLab is still a Scala combinator-based parser.

Download and Run VisualLangLab

To run VisualLangLab, just download VLL4J.jar and double-click it in a file-browser. Linux, Mac OS, and UNIX users will have to enable execution first (chmod +x VLL4J.jar). Documentation for version 10 and higher is currently on the web-site only.

When started, VisualLangLab displays the GUI shown in Figure-2 below. The menus and buttons are explained as needed, but a full description can also be found at The GUI. All toolbar buttons have tool-tip texts that explain their use.

The VisualLangLab GUI
Figure-2. The VisualLangLab GUI

The display areas of the GUI are used as described below.

  • A is used for the grammar-tree as described in Creating the Parser-Rules below
  • B displays the AST structure of the selected grammar-tree node
  • C is where the selected node's action-code (semantic action, or just action) is displayed and edited. If this appears to break the no code, no script promise, rest assured that action-code is always optional
  • D and E are used for testing the parser as described below

The only prerequisite for running VisualLangLab as described above is a 6.0+ JRE. No other tool or software is required.

Creating the Parser-Rules

In the book, the parser-rules are organized in top-down fashion. But here, we start at the bottom with doubleNumber, and work our way up, creating some of the parser-rules so as to demonstrate most of VisualLangLab's features. The complete parser is also included as a sample parser within the GUI (select "Help" -> "Sample grammers" -> "PSWP-Payroll-Parser-Combinators" from the main menu).

Import Required Tokens

The book's code starts by importing JavaTokenParsers and you can obtain the same effect in VisualLangLab by importing the corresponding token library. Select Tokens -> Import tokens from the main menu, or click the Import tokens (The import-tokens GUI) button, and choose the file TL-JavaTokenParsers.vll from the the grammars directory of the legacy-version zip distribution (you do not need any other files from this zip file). This token-library contains regex tokens with the same names and functionality. However, for reasons explained in JavaTokenParsers below, all the names have an underscore (_) suffix.

def doubleNumber = floatingPointNumber ...

This is a very simple parser-rule, and just matches one token unconditionally. To create the new rule click the New rule button (The New-rule button), and enter doubleNumber into the dialog box presented as in Figure-3 below. Clicking the dialog's OK button creates a new rule with just a Root node (The Root node).

Creating the doubleNumber parser-rule
Figure-3. Creating the doubleNumber parser-rule

Now right-click the Root node, and select Add -> Token from the popup context-menu. Another dialog with the names of available tokens is presented as shown on the left side of Figure-4 below. Select floatingPointNumber_ and click the OK button. Your almost-complete parser-rule should look like the right side of Figure-4 below.

Selecting the decimalNumber token
Figure-4. Selecting the decimalNumber token

Finally, we need to add some action-code. Select (click) the decimalNumber token (The Regex icon), and paste the code given below into the text area under Action Code as in Figure-5 below. Click the Save button (enabled when the content of the action-code area changes). Observe that this adds the action annotation (near red arrow) to the decimalNumber token. From version 10.21, the rule-name (doubleNumber) in the toolbar's dropdown list also has a small green arrow-icon placed near it.

function (a) {
  if (a) {
    return parseFloat(a);
  }
}

Action-code functions are explained further in Action Code below.

Adding action-code to the doubleNumber rule
Figure-5. Adding action-code to the doubleNumber rule

def toBe = "is" | "are"

This parser-rule uses two custom literal tokens (is and are) which must be defined first. Proceed as follows.

  • click the New literal button (the The New-literal button icon), enter "IS is" into the dialog presented, and click the OK button (as in Figure-6 below)
  • repeat the preceding action once more, but enter "ARE are" instead

The information entered into the dialog when creating a literal contains the literal's name, and its pattern separated by a comma. You can optionally add spaces around the comma for clarity if required. The name is used to refer to the literal in rules (as seen in Figure-7 below), while the pattern is it's literal definition.

Creating the is and are tokens
Figure-6. Creating the is and are tokens

This parser-rule uses a Choice (the The Choice icon icon) as its top-level node. A Choice is VisualLangLab's equivalent of the | combinator. But unlike | it accepts an arbitrary number (>= 2) of child nodes, and returns an AST that remembers which child node matched the input (see example below). To create the parser, perform the following steps.

  • click the New rule button (The New-reference button), enter toBe into the dialog presented, and click its OK button
  • right-click the root node (The Root icon) and select Add -> Choice from the context menu. A Choice node (The Choice icon) is added to the root node (as in the left and middle parts of Figure-7 below)
  • right-click the newly created choice node and select Add -> Token from the context menu. A dialog box with a dropdown list containing known token names is popped up. Select IS from the dropdown list, and click the OK button
  • repeat the prceding action, but select the token ARE instead

Your toBe parser-rule should now look like the one on the right side of Figure-7 below.

The toBe parser
Figure-7. The toBe parser

The grammar-tree's node icons are designed to be intuitive, but you can find a guide to all the icons in Grammar Tree Icons and Annotations

def percentage = toBe ~> doubleNumber <~ "percent" <~ "of" <~ "gross" ...

This parser-rule uses a Sequence (the The Sequence icon icon) as its top-level node. A Sequence is VisualLangLab's equivalent of the ~, <~, and ~> combinators. But unlike the ~ family, it accepts an arbitrary (non-zero) number of child nodes, and the AST returned is just a Java array which is much simpler to handle than a nested instance of case class ~. To create this parser, perform the following steps.

  • click the New rule button (The New-rule button), enter percentage into the dialog presented, and click its OK button
  • right-click the root node (The Root icon) and select Add -> Sequence from the context menu. A Sequence node (The Sequence icon) is added to the root node
  • right-click the newly created sequence node and select Add -> Reference from the context menu. A dialog box with a dropdown list containing known parser-rule names is presented. Select toBe from the dropdown list, and click the OK button
  • repeat the preceding action once more, selecting the reference doubleNumber instead
  • right-click the sequence node and select Add -> Token from the context menu. A dialog box with a dropdown list containing known token names is presented. Select PERCENT from the dropdown list, and click the OK button
  • repeat the preceding action two more times, and add the tokens OF and GROSS to the sequence node

Examining the arrows (~> and <~) in the definition, we can deduce that only the result of matching doubleNumber is desired to be retained. All other tokens are to be dropped from the AST. You can drop tokens from the sequence's AST by right-clicking each node's icon and selecting drop from the context menu (as shown on the left side of the figure). Remember not to drop the doubleNumber token. Your finished parser-rule should now look like the right side of Figure-8 below.

The percentage parser-rule
Figure-8. The percentage parser-rule

Observe that the icons of dropped nodes are overlaid with a black line from the lower-left to the upper-right. The annotation drop is also added after the name of the node.

Finally, add the action-code: select (click) the Sequence icon (The Sequence icon), and paste the Javascript code given below into the text area under Action Code. Then click the Save button.

function (arg) {
  if (arg !== null) {
    return VLL.grossAmount * (arg / 100);
  }
}

Though different from the original text, this action function is actually functionally equivalent, as explained in Action Code below. Your finished parser-rule should look like the one in Figure-8 above.

def days = "days?".r ...

This parser-rule uses a custom regex token (days?) which must be defined first. Proceed as follows.

  • click the New regex button (the The New-regex button icon), enter "DAYS days?" into the dialog presented, and click the OK button (as in Figure-9 below). Here days? is a regular-expression Pattern
  • click the New rule button (The New-rule button), enter days into the dialog presented, and click its OK button
  • right-click the root node () and select Add -> Token from the context menu. Select DAYS from the dropdown list, and click the OK button

Creating the is and are tokens
Figure-9. Creating the is and are tokens

Finally, add the action-code by selecting he DAYS icon (The Regex icon), and pasting the code given below into the text area under Action Code. Then click the Save button.

function (a) {
  if (a) {
    return 1;
  }
}

Action-code function design and use is explained in Action Code below. Your finished parser-rule should look like the one on the right side of Figure-9 above.

Ad Hoc Testing - A Short Detour

A full section on testing comes later, this section demonstrates the simplicity and power of VisualLangLab's manual testing facilities. It shows how you can effortlessly validate every little addition or change without using or learning any other skills or tools.

Validating doubleNumber

To manually test doubleNumber proceed as follows. Use the toolbar's combo box to select doubleNumber as in Figure-10 below.

  • enter test input under Parser Test Input (red rectangle at bottom left)
  • click the Parse input button (The Parse-input button)
  • validate the parser's result printed under Parser Log (wide red rectangle at bottom right)

If you do not see any red text (as in Figure-12 below) in the Parser Log area, your parser executed without errors. But that alone is not enough, you should verify that the result returned (the 25 in Figure-10, for example) is the value expected.

Testing the doubleNumber parser
Figure-10. Testing the doubleNumber parser

The parser's result or AST is on the third line (after the result follows:). The previous two lines of output contain performance information that should be ignored. This test passes as 25 (last line under Parser Log) is the expected result.

Validating toBe

Before you start to enter test data for toBe take a moment to understand the structure of its output. All parser-rules return an abstract syntax tree (or AST) whose structure depends on the arrangement and properties of the grammar-tree's constituent nodes as explained in AST Structure. The text area under Parse Tree (AST) Structure displays the expected AST structure of the selected grammar-tree node. Figure-11 below tells you that the returned result is one of two array objects (depending on what was found in the input).

The AST of the toBe parser-rule
Figure-11. The AST of the toBe parser-rule

Figure-12 below shows the result of exercising toBe with three different inputs: is, are, and other.

The AST of toBe with is
The AST of toBe with are
The AST of toBe with other
Figure-12. Validating toBe with different inputs

This set of tests too pass as all three test cases produce the expected result.

The Remaining Parser-Rules

The remaining parser-rules, except one that uses a repsep and is described fully below, present no new difficulties and we leave them as an exercise for the reader. You can verify your creation by comparing it with the saved-grammar file payroll-parser-comb.vll in the grammars directory of the zip distribution.

def deductItems = repsep(deductItem, ",")

This parser-rule uses a RepSep (the The RepSep icon icon) as its top-level node. To create this rule proceed as follows.

  • click the New rule button (The New-rule button), enter deductItems into the dialog presented, and click its OK button
  • right-click the root node (The Root node) and select Add -> RepSep from the context menu (as on the left side of Figure-13 below). A RepSep node (The RepSep icon) is added to the root node
  • right-click the newly created RepSep node and select Reference from the context menu. A dialog with known parser-rule names is presented. Select deductItem from the dropdown, and click the OK button
  • right-click the RepSep node again and select Add -> Token from the context menu. Select COMMA from the dropdown, and click the OK button

Your finished rule should look like the one on the right side of Figure-13 below.

The deductItems parser-rule
Figure-13. The deductItems parser-rule

Action Code

Action-code is written as anonymous funtion literals in Javascript. You can find detailed guidelines at Action-Code Design.

Action-Code: weekDays

The purpose of weekDays is to check the input, and return the integer 5 if either weeks or week is found, and the integer 1 if days or day is found. As described in Action Code below, action-code functions must handle two cases: a null argument, and a non-null (real AST) argument. weekDays' action function only needs to handle the non-null argument case. (Of all the examples in this article, only the action-code in Wrapper with Actions below needs to handle the null-argument case.)

To understand the logic of the action-code you must know the structure of the AST passed in (as the argument a). The rule's AST structure is depicted in the text area under Parse Tree (AST) Structure. AST Structure describes AST structuring principles in general.

The weekDays action-code function
Figure-14. The weekDays action-code function

The AST structure and action-code of the two subordinate rules used by weekDays (weeks and days) is shown in Figure-15 below.

Action function weeks
Action function days
Figure-15. Action functions of weeks and days rules

Since weekDays' subordinate rules return the required values (5 and 1), its own action-code only needs to pass on the value recieved. Based on the above details we know that the AST passed in to weekDays is one of these two values: Object[] {0,5}, and Object[] {1, 1} (the second member of each array being the value from the subordinate rules). So weekDays merely needs to return the second member of the array.

Action-Code: percentage

Figure-16 below illustrates the action-code associated with the rule percentage. This function uses a non-local reference VLL.grossAmount. As described in Action-Code Design, the symbol VLL refers to a common global object available to all action-code functions, and should be used as a repository for all parser-specific features (methods and functions) and state (data). Wrapper with Actions below illustrates how VLL is equipped with required features before testing begins.

A Javascript action-code function
Figure-16. A Javascript action-code function

The function does not have any setting-up role, so does not have to handle the null-argument case. The value of arg it receives is the output from the rule doubleNumber (see AST structure). It returns the computed percentage value.

Testing

Testing is much simpler in VisualLangLab than in virtually any other tool. Ad Hoc Testing - A Short Detour showed how effective ad hoc manual testing can be in certain situations. VisualLangLab also supports automated testing, and the following sections describe two different approaches for different situations.

Wrapper with Actions

This approach uses an additional parser-rule to wrap the main (or top-level) parser-rule with before and after scripts. The before script sets up initial conditions before the main parser-rule is invoked, and the after script validates the parse-tree returned by the main parser-rule. Figure-17 below shows details of the wrapper rule PaycheckTester used for testing this parser. To display the before and after scripts (actually parts of a single action-code function) you must select (click on) the Reference node that points to the Paycheck parser-rule. From version 10.21 onwards, parser-rules designed for testing using this technique are automatically distinguished with a special icon in the toolbar's dropdown list.

If you have not read Action-Code Design and AST Structure, please do so now!

Wrapper rule for automated testing
Figure-17. Wrapper rule for automated testing

The Javascript function with the before and after scripts is reproduced below for clarity. The setup part of the function (null-argument part) performs two functions:

  • sets up the data-member grossSalary, and creates the method salaryForDays in the global object VLL. (Javascript's dynamic nature allows it to add data and function members to existing objects at run-time)
  • deposits a few lines of test input (from here in the OFPS version of Programming Scala) into the Parser Test Input area. These lines are processed as input when the main parser-rule (Paycheck) runs

The test part of the code checks the structure of the parse-tree returned by the main parser, and prints out an appropriate message.

function (arg) {
  if (arg === null) {
//****************************************
//          SETUP actions
//****************************************
  // global variables and functions ...
    VLL = {};
    VLL.grossSalary = 500.0
    VLL.salaryForDays = function (days) {
      return VLL.grossSalary * days;
    }
  // Input text for parser ...
    vllParserTestInput.setText(
      "paycheck for employee \"Buck Trends\"\n" +
      "is salary for 2 weeks minus deductions for {\n" +
      "  federal income tax is 25. percent of gross,\n" +
      "  state income tax is 5. percent of gross,\n" +
      "  insurance premiums are 500. in gross currency,\n" +
      "  retirement fund contributions are 10. percent of gross\n" +
      "}"
    )
  } else {
//****************************************
//          TEST actions
//****************************************
    if (arg.length === 3) {
      var error = ""
      var empl = arg[0];
      if (!empl.equals("Buck Trends")) {
        error += "BAD empl (" + empl + "), ";
      }
      var gross = arg[1].doubleValue();
      if (gross !== 5000) {
        error += "BAD gross (" + gross + "), ";
      }
      var deduct = arg[2].doubleValue();
      if (deduct !== 2500) {
        error += "BAD deduct (" + deduct + "), ";
      }
      if (error === "") {
        return "OK";
      } else {
        return error;
      }
    } else {
      return "BAD AST structure";
    }
  }
}

To understand how this works, let's run it a few times with and without changes to the test input as in the table below. The red text is the changed part (in the table). To run the test after making changes in the action-code click the Save button and wait for a pop-up dialog to confirm that the change was accepted, click the pop-up's OK button, and then click the Parse input (The Parse-input button) button.

Sample Inputs for Rule Paycheck
Input Result
No changes OK
... employee "Duck Trends" ... BAD empl,
... salary for 22 weeks minus ... BAD gross, BAD deduct,
... premiums are 900. in ... BAD deduct

Near the top of the script, a value called ParserTestInput is used. This name is a reference to the GUI's JTextArea from which the parser under test obtains input (and into which test input is normally entered manually). More information about this can be found under Predefined Variables.

Scala Parser Combinator Compatibility

The following discussion is no longer (version 10 onwards) accurate. However, the contents still have some relevance as VisualLangLab still uses a Java translation of some elements of the scala.util.parsing.combinator package.

The class diagram in Figure-18 below depicts VisualLangLab's relationship with Scala's parser combinators. Although it depends on RegexParsers, it changes most of the functionality by overriding many of the important methods.

Relationship With Scala Parser Combinators
Figure-18. Relationship With Scala Parser Combinators

Full details of how VisualLangLab uses Scala's parser combinator library can be seen in Relationship with Scala Parse rCombinators.

Builtin Lexical Analyzer

VisualLangLab overrides the literal(String) and regex(Regex) methods of Scala's RegexParsers with versions that work with a built-in lexer. The behavior of these versions is similar to lexers in more sophisticated parser development tools, and is described in Simple Lexing RegexParsers.

JavaTokenParsers

Many example parsers in tutorials and other resources extend JavaTokenParsers, so VisualLangLab provides a token library in it's place. Those planning to use this token library should, however, note that the tokens defined in it have names ending in an underscore (_). Ending the name of a token in an underscore causes VisualLangLab's built-in lexer to be bypassed, in favor of the RegexParsers capability.

This change is required because the builtin lexer works best when the regex definitions are distinct, and multiple regex tokens will not match the same lexeme. But the regular expressions used in JavaTokenParsers are such that floatingPointNumber subsumes both decimalNumber and wholeNumber, and decimalNumber itself subsumes wholeNumber.

Pros and Cons

The table below lists the pros and cons of using VisualLangLab compared with a hand-written parser based on Scala parser combinators.

Aspect Pros Cons
Visual Completely visual. Code never required, except if semantic predicates are used (but they are rare) -
All IDE functions provided - file, edit (tokens, rules, globals), test-run, input-output -
Lexical Analyzer The builtin lexer works like lexical analyzer generators, but does not require any separate steps or tools -
Token definitions may use the full regular-expression language of the JDK's Pattern -
Unlike RegexParsers, all literal tokens are treated as reserved keywords that have higher priority than regex tokens. The next lexeme provided by the lexer is always the longest possible part of the input that matches any known token -
RegexParsers compatible behavior can also be obtained if required -
- Performance is probably poorer than RegexParsers though no measurements have been made
AST Defined and created automatically following a well-defined and documented convention for AST structure. The AST is complete and unambiguous, and captures every detail of how the input matched the grammar -
The AST is defined using common types from the Scala API, so no specialized knowledge is needed when using AST from Scala. Programs in other JVM languages can obtain a version of the AST constructed from JVM types only (release 6.01+). -
All elements in the AST are amenable to Scala pattern matching, so AST processing is simple. Programs in other JVM languages have to deal with a mutually nested structure of arrays and lists. -
Action Code Completely separated from grammar text. Written as function literals invoked in a well-defined and documented protocol. Action code functions may access their environment via a defined and documented set of predefined global variables. -
Option of languages: Scala and Javascript -
Never required to be embedded in parser text -
- Lexical context of the parser is not available to action code. However, since this restricts the action-code to its own sand-box it can be seen as an an advantage too.
RegexParsers Compatibility Provides token libraries for JavaTokenParsers -
Defines two layers (SimpleLexingRegexParsers an Aggregates) which value-add to RegexParsers's capabilities -
The usual RegexParses interface is exposed to API users -
- Not directly compatible, and such code can not be generated
API Parser developed in the VisualLangLab IDE may be saved to a file, and subsequently used from a host program via the API -
API usable from any JVM language (release 6.01+) -
Parser does not have to be compiled or linked into host program. The parser can even be modified separately without affecting the host program (provided the AST structure is not changed) -
API based on the familiar RegexParsers interface for convenience -
Testing Includes comprehensive builtin support for ad hoc testing. No additional tools, code, or skills are required. -
Test drivers for black-box testing are easily created and run within the IDE -
Any (or all) rules can be trace-enabled at the click of a button. Even individual grammar-tree nodes can be traced. -
 
 
Close
loading
Please Confirm
Close