parser
Lua 5.1 parser

Version: 0.1.2
Generated: November 26, 2007

Description

Pairing with scanner, this module exports Lua 5.1's syntactic rules as a grammar.

Dependencies

The Grammar

The rules variable implements the official Lua 5.1 grammar. It includes all keyword and symbol rules in scanner, as well as the CHUNK rule, which matches a complete Lua source file.

rules is a table with open references, not yet a LPeg pattern; to create a pattern, it must be given to lpeg.P. This is done to enable users to modify the grammar to suit their particular needs. grammar provides a small API for this purpose.

The code below shows the Lua 5.1 grammar in LPeg, minus spacing issues.

The following convention is used for rule names:

  • TOKENRULE: token rules (which represent terminals) are in upper case when applicable (ex. +, WHILE, NIL, ..., THEN, {, ==).
  • GrammarRule: the main grammar rules (non-terminals): Examples are Chunk, FuncName, BinOp, and TableConstructor.
  • _GrammarRule: subdivisions of the main rules, introduced to ease captures. Examples are _SimpleExp, _PrefixExpParens and _FieldExp.
  • METARULE: grammar rules with a special semantic meaning, to be used for capturing in later modules, like BOF, EOF and EPSILON.

rules = {
   -- See peculiarities below
   IGNORED  = scanner.IGNORED  -- used as spacing, not depicted below
   EPSILON = lpeg.P(true)
   EOF     = scanner.EOF  -- end of file
   BOF     = scanner.BOF  -- beginning of file
   Name    = ID

   -- Default initial rule
   [1]     = CHUNK
   CHUNK   = scanner.BANG^-1 * Block

   Chunk   = (Stat * ';'^-1)^0 * (LastStat * ';'^-1)^-1
   Block   = Chunk

   -- STATEMENTS
   Stat          = Assign + FunctionCall + Do + While + Repeat + If
                 + NumericFor + GenericFor + GlobalFunction + LocalFunction
                 + LocalAssign
   Assign        = VarList * '=' * ExpList
   Do            = 'do' * Block * 'end'
   While         = 'while' * Exp * 'do' * Block * 'end'
   Repeat        = 'repeat' * Block * 'until' * Exp
   If            = 'if' * Exp * 'then' * Block
                     * ('elseif' * Exp * 'then' * Block)^0
                     * (('else' * Block) + EPSILON)
                     * 'end'
   NumericFor    = 'for' * Name * '='
                     * Exp * ',' * Exp * ((',' * Exp) + EPSILON)
                     * 'do' * Block * 'end'
   GenericFor    = 'for' * NameList * 'in' * ExpList * 'do' * Block * 'end'
   GlobalFunction = 'function' * FuncName * FuncBody
   LocalFunction = 'local' * 'function' * Name * FuncBody
   LocalAssign   = 'local' * NameList * ('=' * ExpList)^-1
   LastStat      = 'return' * ExpList^-1
                 + 'break'

   -- LISTS
   VarList  = Var * (',' * Var)^0
   NameList = Name * (',' * Name)^0
   ExpList  = Exp * (',' * Exp)^0

   -- EXPRESSIONS
   Exp          = _SimpleExp * (BinOp * _SimpleExp)^0
   _SimpleExp   = 'nil' + 'false' + 'true' + Number + String + '...' + Function
                + _PrefixExp + TableConstructor + (UnOp * _SimpleExp)
   _PrefixExp   = ( Name                  a Var
                  + _PrefixExpParens      only an expression
                  ) * (
                      _PrefixExpSquare    a Var
                    + _PrefixExpDot       a Var
                    + _PrefixExpArgs      a FunctionCall
                    + _PrefixExpColon     a FunctionCall
                  ) ^ 0

   -- Extra rules for semantic actions:
   _PrefixExpParens = '(' * Exp * ')'
   _PrefixExpSquare = '[' * Exp * ']'
   _PrefixExpDot    = '.' * ID
   _PrefixExpArgs   = Args
   _PrefixExpColon  = ':' * ID * _PrefixExpArgs

   -- These rules use an internal trick to be distingished from _PrefixExp
   Var              = _PrefixExp
   FunctionCall     = _PrefixExp

   -- FUNCTIONS
   Function     = 'function' * FuncBody
   FuncBody     = '(' * (ParList+EPSILON) * ')' * Block * 'end'
   FuncName     = Name * _PrefixExpDot^0 * ((':' * ID)+EPSILON)
   Args         = '(' * (ExpList+EPSILON) * ')'
                + TableConstructor + String
   ParList      = NameList * (',' * '...')^-1
                + '...'

   -- TABLES
   TableConstructor = '{' * (FieldList+EPSILON) * '}'
   FieldList        = Field * (FieldSep * Field)^0 * FieldSep^-1
   FieldSep         = ',' + ';'

   -- Extra rules for semantic actions:
   _FieldSquare     = '[' * Exp * ']' * '=' * Exp
   _FieldID         = ID * '=' * Exp
   _FieldExp        = Exp

   -- OPERATORS
   BinOp    = '+' + '-' + '*' + '/' + '^' + '%' + '..'
            + '<' + '<=' + '>' + '>=' + '==' + '~='
            + 'and' + 'or'
   UnOp     = '-' + 'not' + '#'

   -- ...plus scanner's keywords and symbols
}

The implementation has certain peculiarities that merit clarification:

  • Spacing is matched only between two tokens in a rule, never at the beginning or the end of a rule.
  • EPSILON matches the empty string, which means that it always succeeds without consuming input. Although rule + EPSILON can be changed to rule^-1 without any loss of syntactic power, EPSILON was introduced in the parser due to it's usefulness as a placeholder for captures.
  • BOF and EOF are rules used to mark the bounds of a parsing match, and are useful for semantic actions.
  • Name versus ID: the official Lua grammar doesn't distinguish between them, as their syntax is exactly the same (Lua identifiers). But semantically Name is a variable identifier, and ID is used with different meanings in _FieldID, FuncName, _PrefixExpColon and _PrefixExpDot.
  • In Lua's original extended BNF grammar, Var and FunctionCall are defined using left recursion, which is unavailable in PEGs. In this implementation, the problem was solved by modifying the PEG rules to eliminate the left recursion, and by setting some markers (with some LPeg chicanery) to ensure the proper pattern is being used.


Variables

rules = table 

A table holding the Lua 5.1 grammar. See The Grammar for an extended explanation.


Functions

apply (extraRules, captures) Uses grammar.apply to return a new grammar, with captures and extra rules. rules stays unmodified.
check (input) Checks if input is valid Lua source code.


apply (extraRules, captures)
    Uses grammar.apply to return a new grammar, with captures and extra rules. rules stays unmodified.

    Parameters:

    • extraRules: optional, the new and modified rules. See grammar.apply for the accepted format.
    • captures: optional, the desired captures. See grammar.apply for the accepted format.

    Returns:

    • the extended grammar.


check (input)
    Checks if input is valid Lua source code.

    Parameters:

    • input: a string containing Lua source code.

    Returns:

    • true, if input is valid Lua source code, or false and an error message if the matching fails.