10.2 Lexical environments

2010-05-05

Scoping is one of the most difficult concepts to grasp for Ecmascript. Especially combined with closures. Even when you master them, they'll come back to haunt you eventually.

The scope of Ecmascript can be viewed as a logical tree. The tree starts at the root, which is a Lexical Environment with a null reference to its outer Lexical Environment and a reference to an Environment Record.

The root, or global scope, can be viewed as belonging to the main Program being executed.

Whenever you invoke something that has its own scope (only function, catch and with have their own scope), a new Lexical Environment is created. The outer Lexical Environment will be set to the current Lexical Environment and the new environment will become the current environment.

The Environment Record of a Lexical Environment records the "identifier bindings" that are created in the scope of the environment. So when you declare var hi, the Environment Record will contain an entry of "hi" and the memory address associated with it. Whenever you reference "hi" in your code, the interpreter will ask the current Lexical Environment for its Environment Record. It will ask the record whether it knows about a variable named "hi". If found, details are returned. Otherwise the outer Lexical Environment is asked the same thing, until there is no more outer Lexical Environment (at the global scope), after which an error is thrown.

You should note that this is a specification type only. There is no way to directly access these notions from within Ecmascript. But they are the reasons closures work the way they do. Whenever you have a function, this function has a Lexical Environment. It will not forget this reference, which is why it is able to access variables from its outer scopes, long after those scopes seem to have ended.

A Lexical Environment can contain multiple inner Lexical Environments (define two functions within another function), but it has no reference to them.

An important thing to note is that a Lexical Environment will survive (not be destroyed by garbage collection) as long as there's another Lexical Environment that's still pointing to it. This means that any and all the variables in its Environment Record will be kept alive, even if they are not referenced from an inner environment at all. Only after all the inner environments are destroyed can an environment itself be up for destruction.

Let's exhibit some examples. First a simple example for scoping:

Code: (Ecma)
var s = 'fail';
function test() {
try {
foo();
} catch(e) {
debugger(s); // here!
}
}

When the code reaches the catch statement, the scope chain will look like this. Note that the names are fictional, purely for demonstration (Environments don't actually have names):

Code: (tree)
<Lexical Environment>
<name>global</name>
<outer>null</outer>
<Environment Record>
<Identifier>
<name>s</name>
<Reference>...</Reference>
</Identifier>
<Identifier>
<name>test</name>
<Reference>...</Reference>
</Identifier>
<Identifier>
<name>debugger</name>
<Reference>...</Reference>
</Identifier>
(... many more global identifiers)
</Environment Record>
<Lexical Environment>
<name>global.func</name>
<outer>global</outer>
<Environment Record>
</Environment Record>
<Lexical Environment>
<name>global.func.catch</name>
<outer>global.func</outer>
<Environment Record>
<Identifier>
<name>e</name>
<Reference>...</Reference>
</Identifier>
</Environment Record>
</Lexical Environment>
</Lexical Environment>
</Lexical Environment>

So when the debugger Identifier is called in the catch statement, there will be three Lexical Environments. The global, the function scope and the catch scope. It will reach the catch because there is no Environment Record in the scope chain that has a recorded binding for "foo", and an error will be thrown.

The scope chain consists of all the Lexical Environments starting at the current and following the path through the outer Lexical Environment, in that order. So it will first lookup the Identifier "debugger", finding it in the global scope. Then it will lookup "s", also finding it in global scope. It will use those References and execute the call to debugger.

I think this illustrates clearly why global variables are considered to be so expensive. It has to search through all the Lexical Environments before actually finding what you tried to reference.

Now let's take a shadowing example:

Code: (Ecma)
var s = 5;
function test() {
var s = 6;
debugger(s); // 6
}
test();
debugger(s); // 5

The inner function declared a variable that was already declared in the global scope. So traversing the scope chain will find the inner variable first and since they are distinct, it will only change that variable, not the global one. Here's the tree when within the function:

Code: (tree)
<Lexical Environment>
<name>global</name>
<outer>null</outer>
<Environment Record>
<Identifier>
<name>s</name>
<Reference>...</Reference>
</Identifier>
<Identifier>
<name>test</name>
<Reference>...</Reference>
</Identifier>
<Identifier>
<name>debugger</name>
<Reference>...</Reference>
</Identifier>
(... many more global identifiers)
</Environment Record>
<Lexical Environment>
<name>global.func</name>
<outer>global</outer>
<Environment Record>
<Identifier>
<name>s</name>
<Reference>...</Reference>
</Identifier>
</Environment Record>
</Lexical Environment>
</Lexical Environment>

Now let's view a common problem of closures and loops. One of the first mistakes made new Ecmascript programmers that figured out functions are objects is creating functions in a loop and calling them later. They either think any curly braces {} create a new scope (which is very common) or don't know about closures yet.

Code: (Ecma)
function a(){
var arr = [];
for (var n=0; n<5; ++n) {
arr[n] = function(){
debugger(n);
};
}
}

At the end of the function, the scope will look like this:

Code: (tree)
<Lexical Environment>
<name>global</name>
<outer>null</outer>
<Environment Record>
<Identifier>
<name>a</name>
<Reference>...</Reference>
</Identifier>
<Identifier>
<name>debugger</name>
<Reference>...</Reference>
</Identifier>
(... many more global identifiers)
</Environment Record>
<Lexical Environment>
<name>global.func</name>
<outer>global</outer>
<Environment Record>
<Identifier>
<name>arr</name>
<Reference>...</Reference>
</Identifier>
<Identifier>
<name>n</name>
<Reference>...</Reference>
</Identifier>
</Environment Record>
<Lexical Environment>
<name>global.func.func1</name>
<outer>global.func</outer>
<Environment Record></Environment Record>
</Lexical Environment>
<Lexical Environment>
<name>global.func.func2</name>
<outer>global.func</outer>
<Environment Record></Environment Record>
</Lexical Environment>
<Lexical Environment>
<name>global.func.func3</name>
<outer>global.func</outer>
<Environment Record></Environment Record>
</Lexical Environment>
<Lexical Environment>
<name>global.func.func4</name>
<outer>global.func</outer>
<Environment Record></Environment Record>
</Lexical Environment>
<Lexical Environment>
<name>global.func.func5</name>
<outer>global.func</outer>
<Environment Record></Environment Record>
</Lexical Environment>
</Lexical Environment>
</Lexical Environment>

What this tree clearly demonstrates is that there is just one n variable. All function scopes will resolve their "n" Identifier to the same Reference. So when any of the anonymous function is called, they will all output 5, rather then the expected 0 through 4.