Oblivion Mod:Oblivion Grammar
Grammar Ambiguity[edit]
As simple as the Oblivion scripting language is, there is actually a large amount of ambiguity which prevents a simple or well formed grammar from being created. For example, the following statements and expressions are all technically valid (they compile although the result may not be as intended):
short 123 set 123 to 321 set 123 to 123 * 123.456 set LocalVar1 to GetAV Luck 10 9 8 * 7 ; 789 is a NPC record set LocalVar1 to 789.GetAV Strength + 789 short GetAV set GetAV to GetAV Strength + player.GetAV Luck + GetAV ; 123 is a local variable in script attached to NPC 789 set 789.123 to 789.123 + 80 set LocalVar1 to
There are several principle causes for this ambiguity:
-
- Variable names can be composed of solely digits making them indistinguishable from numbers.
- Variable names can conflict with functions or references (not confirmed).
- Function calls have no start/stop tokens (like brackets in many languages). This makes it difficult, or impossible, to determine where function parameters ends in an expression, especially when considering some function parameters are optional.
This sort of grammar ambiguity will prevent a parser from generating a syntax tree if a simple lexical analysis step is used. The issue is that the parser does not know if an identifier such as 123 or GetAV is a number, local, global, reference, or function token.
The following notes will help solve these ambiguities and allow a simple grammar to be defined, LALR(1) or SLR for example:
-
- Variables and editor IDs cannot be solely composed of digits.
- A stronger rule that variables and IDs cannot start with a digit might be used. This makes tokenizing and parsing much simpler but may interfere with existing scripts and plugins.
- Separating an identifier into a function, ID, or variable may have to be done at a earlier stage. Unfortunately this complicates the grammar and may prevent certain variable names from being used.
- Separate rules for function calls used in expressions.
- Function definitions must be known to properly parse its parameters when used in expressions.
Character Classes[edit]
Character classes are used by the tokenizer to turn a character stream into a token stream.
CCDigit = [0..9] CCAlpha = [a..z] [A..Z] CCEndLine = \n NULL CCComment = ; CCWhiteSpace = space \t \r CCQuote = " CCIDStart = CCAlpha _ CCID = CCIDStart CCDigit CCBracket = ( ) CCStringChar = !CCEndLine !CCQuote CCEqual = = CCDecimal = . CCAddOp = + - CCMultOp = * / % CCRelOp = = < > ! CCComma = ,
Tokens[edit]
Tokens are created by the tokenizer using the character class definitions and input into the parser.
Unknown LBracket = ( RBracket = ) AddOp = + | - Sign = + | - MultOp = * | / | % RelOp = < | > | <= | >= | != | == BoolOp = "&&" | "||" String = [CCQuote] [CCStringChar]* [CCQuote] Identifier = If = if Elseif = elseif Else = else Endif = endif Set = set To = to Scriptname = scriptname | scn Begin = begin End = end Return = return Comment = [CCComment] [Chars]* [CCEndLine] { ignore } EndLine = [CCEndLine] Integer = Float = EndofProgram = LocalVarDef = short | long | float | ref Comma = [CComma] { ignore }
Parser Grammar[edit]
The following grammar is used by the parser to turn the stream of tokens into a valid parse tree. Terminals are in bold and non-terminals (token types) are in italics. The following is a LALR(1) grammar specification (I think).
Start => scriptname identifier endline Program EndOfProgram Program => OuterStateList OuterStateList => OuterStatement OuterStateList => OuterStatement OuterStateList OuterStatement => LocalVar OuterStatement => BeginBlock endline StatementList end endline BeginBlock => begin identifier BeginBlock => begin identifier identifier BeginBlock => begin identifier integer OuterStatement => empty LocalVar => localvardef identifier endline StatementList => Statement StatementList StatementList => Statement StatementList => empty Statement => set IDReference to SimpleExp endline Statement => IfStatement StatementList ElseIfStates ElseState endif endline Statement => FunctionCall Statement => LocalVar Statement => return endline FunctionCall => IDReference ParamList endline IDReference => identifier IDReference => identifier.identifier ParamList => ParamList Parameter ParamList => Parameter Parameter => identifier Parameter => string Parameter => integer Parameter => float Parameter => sign integer Parameter => sign float IfStatement => if Expression endline ElseIfStates => ElseIfState ElseIfStates ElseIfStates => ElseIfState ElseIfStates => empty ElseIfState => elseif ExpList endline StatementList ElseState => else endline StatementList ElseState => empty Expression => SimpleExp Expression Expression => SimpleExp relop SimpleExp Expression => ( Expression ) SimpleExp => Term SimpleExp => sign Term SimpleExp => SimpleExp addop Term Term => Factor Term => Term multop Factor Factor => identifier Factor => integer Factor => float Factor => ( SimpleExp ) Factor => FunctionCall