Parser Stack Depth Exceeded

A Brief Introduction to LALR Parsing

Both Second Life and lslint "read" scripts using what is called an LALR Parser. What this parser does is take a sequential stream of tokens and decide whether they make sense in the order they're given. For example, you know what a function looks like: it might have a type, then it has a name, then arguments, and then the function body. Pretty simple, right? Let's look at one:

integer is_awesome(string name) {
  if ( name == "masa" )
    return TRUE;
  else
    return FALSE;
}

Here's what the parser sees:

INTEGER IDENTIFIER '(' STRING IDENTIFIER ')' '{'
  IF '(' IDENTIFIER EQ STRING_CONSTANT ')'
    RETURN INTEGER_TRUE ';'
  ELSE
    RETURN INTEGER_FALSE ';'
'}'

The conversion from the text you see to the stream of tokens that the parser sees is done by something called a lexer, which we won't get into. It's a little different from the way you see it, but close enough that you should be able to understand it. There is additional information associated with each token, such as the actual name of each IDENTIFIER, but it is not needed for syntax checking.

So, how does the parser know it's looking at a function? The parser reads one token at a time and puts it on a stack, and then it looks at the top of the stack and decides if it looks like anything it recognizes. When it does, it takes all those tokens that were part of what it recognized, and reduces them to one new token. Eventually, if the script is syntactically valid, it will be reduced to a single token called lscript_program.

The easiest thing to recognize is a type. When the token at the top of the stack is INTEGER or VECTOR or some other type name, the parser recognizes it as a type name and replaces it with a typename token. Let's walk through the example above, from the parser's perspective.

First, we have INTEGER. We put that on our stack, and then take a look. We immediately recognize this as a type, and replace it with a typename token. Now we take the next token, IDENTIFIER, and put it on the stack, so it now looks like this:

typename IDENTIFIER

Well, we know that a type followed by an identifier is only used in a couple of situations (variable, function, and parameter declarations), so we'll replace those two tokens with one name_type token. And we'll forgive a certain Linden Lab employee for choosing such backwards and ambiguous token names.

Next we have an opening parenthesis, making our stack look like this:

name_type '('

That doesn't really look like anything, so we'll keep going. We add STRING to our stack, change it for a typename, but don't recognize anything else. We add IDENTIFIER, and now we have something familiar: a type and an identifier again! We could replace it with a name_type token, but looking even further back we see that other name_type and opening parenthesis. We can be pretty sure we're looking at a function parameter, so we'll make it a function_parameter token instead.

Right Recursion -or- How Not To Write a Language Grammar

The parser can only match a fixed series of tokens, which has an important consequence. What if we had specified two or even three parameters for our function? Does the parser have a pattern for every possible number of parameters? This problem is solved by recursion, and there are two kinds: right recursion, which is bad, and left recursion, which is good.

For both types of recursion, when we have a token we want to match one or more of, we create another token to contain a whole group. In the case of fuction parameters, it is (intuitively) called function_parameters. This group token can match two different patterns: a function_parameter by itself, or a single function_parameter paired with another function_parameters group. This is why it's called recursion: the token can contain itself.

How they are paired determines which type of recursion it is, and this is where Second Life's parser gets into trouble. For left recursion, the token matches itself on the left side. So, a function_parameters token would match either a function_parameter token all by its lonesome, or an existing function_parameters group followed by another function_parameter.

When the parser ends up with a single function_parameter on the stack, it can immediately exchange it with a function_parameters token. If it reads another function parameter after that, the stack looks like this:

... function_parameters ',' function_parameter

In this case, the longer match wins, and instead of reducing just the single token at the end, we reduce the whole group. This way we are left with a single function_parameters, and can keep repeating this process until we run out of parameters to read.

When we use right recursion, the order is swapped: we match a single function_parameter followed by a group of function_parameters. This time, when we end up with a single function_parameter on the stack, we can't do anything yet. We don't know whether it's a parameter by itself or one followed by a group, so we have to keep reading. If we end up reading something that's not a function parameter, like a closing parenthesis, then we know there are no more parameters and can reduce the single parameter to a function_parameters group.

But what if there's another parameter? Then our stack looks like this:

... function_parameter ',' function_parameter

We still can't do anything! We have to keep reading parameters until we find something else before we do any reductions!

Suppose we read three more parameters giving us this messy stack:

... function_parameter ',' function_parameter ',' function_parameter
',' function_parameter ',' function_parameter

If, at this point, the next token is a closing brace, we can finally start reducing. The single parameter at the end is replace with a parameters group, which can then be paired with the preceding parameter to make a new group, and so on.

Second Life uses right recursion for every token group except statements, though as we will see later there is a special gotcha for 'if' statements. This means that each global variable and function takes up an entry on the parser stack until the first state is reached, and each state takes up an entry until the end of the script, every item in a list takes up an entry until the closing bracket, and so on. Any time you have a repeating syntactical element, each instance will take up at least one entry on the stack until the entire set is finished.

How Deep Does the Rabbit Hole Go?

On Windows and possibly Linux or non-Linden binaries for any platform, Second Life's parser stack is limited to 150 items. When the parser tries to push the 151st item onto the stack, Second Life will give a syntax error, and lslint will tell you "parser stack depth exceeded." Hopefully the error is a little bit less cryptic now.

Other compilers may not be able to detect the stack depth error because it is heavily dependent on the way that Second Life's parser is written. Using a different grammar or different parser generator would make it difficult or impossible to guess what Second Life's parser stack would look like at a given point in the script.

Hidden Right Recursion For Statements

Above I mentioned that statements are the only token group in Second Life's parser that do not use right recursion. If they did, you'd only be able to write a function or event handler that was 150 lines long, and the stack depth error would be a lot more common.

We can verify this by taking a look at the Second Life source code:

statements
  : statement                                   
  | statements statement                              
  ;

This means that a statements token matches either a single statement or a group of statements followed by a single statement. Since the group is on the left side, we know that this is left recursion, which is the good kind. So where is the problem?

Lets look at some of the things a statement can be:

statement
  : ';'
  ...
  | expression ';'
  | declaration ';'
  ...
  | IF '(' expression ')' statement
  | IF '(' expression ')' statement ELSE statement

A lot of possible statements are omitted, but this is enough to understand what a statement is and even where the problem is. If you've already spotted the problem, give yourself a high five: you've either written a parser before or you've got an IQ that's higher than most Second Life users can count.

We can see that, among other things, a statement can be a semicolon all by itself. This prevents a syntax error if you accidentally type two semicolons after a normal statement, and gives you a big headache if you accidentally do something like this:

if (TRUE == FALSE); // notice the semicolon!
  do_something();

The semicolon takes the role of the statement to complete the if, which basically makes it do nothing. The do_something() will always be executed, because it's not really part of the if statement - the lonely semicolon stole its spot.

A statement can also be an expression, which is something that produces a value, like a = b (the value produced is b) or do_something(), followed by a semicolon. It can be a declaration, like integer a or string password = "secret". No surprises here.

Now we get to the problem: the if statement. Remember when we had a token that could contain itself, and called it recursion? The same thing is hiding here, even though it's not a 'group' kind of token! Notice that the IF can contain statement -- and it's on the right side, which makes it the bad kind of recursion.

if ( number == 1 )
  do_something();
else if ( number == 2 )
  do_something_else();
else if ( number == 3 )
  do_yet_another_thing();
else if ( number == 4 )
  do_nothing();

Suppose the parser just got to the end of the above code. Can you guess what the stack looks like?

IF '(' expression ')'
  statement
ELSE IF '(' expression ')'
  statement
ELSE IF '(' expression ')'
  statement
ELSE IF '(' expression ')'
  statement

That's 22 items on the stack already, and we still can't reduce anything! We have to keep pushing tokens onto the stack until we find something that's not a statement and not ELSE before any one of our IF statements are completed.

How To Fix It

If you've got a stack depth error in a big chain of if statements, you need to break the chain somewhere. Suppose we got the error on the line containing number == 3 above. We could solve it this way:

if ( number == 1 )
  do_something();
else if ( number == 2 )
  do_something_else();
else
  jump if001continue001; // nothing above matched, continue to more checks
jump if001break001;    // otherwise, skip past them

@if001continue001;
if ( number == 3 )
  do_yet_another_thing();
else
  jump if001continue002; // supposing we had to break it up again,
jump if001break002;    // we have to make new labels for every jump,
                       // because only the first jump to a label works
@if001continue002;
if ( number == 4 )
  do_nothing();

@if001break001; @if001break002;

(note: the example here used to be incorrect, fixed 2007-09-03)

If you're getting the error inside a list, you can just break it up into two lists and join them with a +. Since global variables can't be initialized to expressions, if the error is happening in a global variable declaration, you'll need to set the variable at the beginning of your script instead. Example:

list my_big_list; // initialized in init_my_big_list()
    
init_my_big_list() {
  my_big_list = [1,2,3,4,5] + [6,7,8,9,0];
}

default {
  state_entry() {
    init_my_big_list();
    crash_grid();
  }
}

Finally, if you're getting the error in normal-looking code without a big chain you can easily break, you can try to make the stack smaller in other ways. You can move the function or state containing the code above other functions or states, so they won't be on the stack when it's parsed. You can reduce the number of global variables and functions if the error is not in a state, or the number of states if it is. If all else fails, you may have to split your script into two.