Oblivion Mod:Oblivion Grammar

The UESPWiki – Your source for The Elder Scrolls since 1995
Jump to: navigation, search

Grammar Ambiguity[edit]

As simple as the Oblivion scripting language is, there is actually a large amount of ambiguity which prevents a simple or well formed grammar from being created. For example, the following statements and expressions are all technically valid (they compile although the result may not be as intended):

    short 123
    set 123 to 321
    set 123 to 123 * 123.456
    set LocalVar1 to GetAV Luck 10 9 8 * 7
    
         ; 789 is a NPC record
    set LocalVar1 to 789.GetAV Strength + 789
    
    short GetAV
    set GetAV to GetAV Strength + player.GetAV Luck + GetAV
    
         ; 123 is a local variable in script attached to NPC 789
    set 789.123 to 789.123 + 80
    
    set LocalVar1 to 

There are several principle causes for this ambiguity:

  1. Variable names can be composed of solely digits making them indistinguishable from numbers.
  2. Variable names can conflict with functions or references (not confirmed).
  3. Function calls have no start/stop tokens (like brackets in many languages). This makes it difficult, or impossible, to determine where function parameters ends in an expression, especially when considering some function parameters are optional.

This sort of grammar ambiguity will prevent a parser from generating a syntax tree if a simple lexical analysis step is used. The issue is that the parser does not know if an identifier such as 123 or GetAV is a number, local, global, reference, or function token.

The following notes will help solve these ambiguities and allow a simple grammar to be defined, LALR(1) or SLR for example:

  • Variables and editor IDs cannot be solely composed of digits.
  • A stronger rule that variables and IDs cannot start with a digit might be used. This makes tokenizing and parsing much simpler but may interfere with existing scripts and plugins.
  • Separating an identifier into a function, ID, or variable may have to be done at a earlier stage. Unfortunately this complicates the grammar and may prevent certain variable names from being used.
  • Separate rules for function calls used in expressions.
  • Function definitions must be known to properly parse its parameters when used in expressions.

Character Classes[edit]

Character classes are used by the tokenizer to turn a character stream into a token stream.

    CCDigit      = [0..9]
    CCAlpha      = [a..z] [A..Z]
    CCEndLine    = \n NULL
    CCComment    = ;
    CCWhiteSpace = space \t \r
    CCQuote      = "
    CCIDStart    = CCAlpha _
    CCID         = CCIDStart CCDigit
    CCBracket    = ( )
    CCStringChar = !CCEndLine !CCQuote
    CCEqual      = =
    CCDecimal    = .
    CCAddOp      = + -
    CCMultOp     = * / %
    CCRelOp      = = < > !
    CCComma      = ,

Tokens[edit]

Tokens are created by the tokenizer using the character class definitions and input into the parser.

    Unknown      
    LBracket     = (
    RBracket     = )
    AddOp        = + | -
    Sign         = + | -
    MultOp       = * | / | %
    RelOp        = < | > | <= | >= | != | ==
    BoolOp       = "&&" | "||"
    String       = [CCQuote] [CCStringChar]* [CCQuote]
    Identifier   = 
    If           = if
    Elseif       = elseif
    Else         = else
    Endif        = endif
    Set          = set
    To           = to
    Scriptname   = scriptname | scn
    Begin        = begin
    End          = end
    Return       = return
    Comment      = [CCComment] [Chars]* [CCEndLine] { ignore }
    EndLine      = [CCEndLine]
    Integer      = 
    Float        = 
    EndofProgram =
    LocalVarDef  = short | long | float | ref
    Comma        = [CComma] { ignore }

Parser Grammar[edit]

The following grammar is used by the parser to turn the stream of tokens into a valid parse tree. Terminals are in bold and non-terminals (token types) are in italics. The following is a LALR(1) grammar specification (I think).

    Start          => scriptname identifier endline Program EndOfProgram
    Program        => OuterStateList
    OuterStateList => OuterStatement
    OuterStateList => OuterStatement OuterStateList 
    OuterStatement => LocalVar
    OuterStatement => BeginBlock endline StatementList end endline
    BeginBlock     => begin identifier 
    BeginBlock     => begin identifier identifier 
    BeginBlock     => begin identifier integer
    OuterStatement => empty
    LocalVar       => localvardef identifier endline
    StatementList  => Statement StatementList
    StatementList  => Statement 
    StatementList  => empty
    Statement      => set IDReference to SimpleExp endline
    Statement      => IfStatement StatementList ElseIfStates ElseState endif endline
    Statement      => FunctionCall
    Statement      => LocalVar
    Statement      => return endline
    FunctionCall   => IDReference ParamList endline
    IDReference    => identifier
    IDReference    => identifier.identifier
    ParamList      => ParamList Parameter
    ParamList      => Parameter
    Parameter      => identifier
    Parameter      => string
    Parameter      => integer
    Parameter      => float
    Parameter      => sign integer
    Parameter      => sign float
    IfStatement    => if Expression endline
    ElseIfStates   => ElseIfState ElseIfStates
    ElseIfStates   => ElseIfState
    ElseIfStates   => empty
    ElseIfState    => elseif ExpList endline StatementList
    ElseState      => else endline StatementList
    ElseState      => empty
    Expression     => SimpleExp Expression
    Expression     => SimpleExp relop SimpleExp
    Expression     => ( Expression )
    SimpleExp      => Term
    SimpleExp      => sign Term
    SimpleExp      => SimpleExp addop Term
    Term           => Factor
    Term           => Term multop Factor
    Factor         => identifier
    Factor         => integer
    Factor         => float
    Factor         => ( SimpleExp )
    Factor         => FunctionCall

See Also[edit]