for scoping

2021-01-22

I stumbled on a new gem yesterday. I'm not sure if I knew this when I was writing Tenko but I certainly forgot about it until Preval was showing unexpected errors for it. The case in point is the special scope in the header of a "for" statement.

The simplest code example looks something like this:

Code:
let x = {};
for (let x in x) {}

This is a runtime error. The problem is that the for gets a special scope. The x that is declared inside of it shadows the globally defined x.

The execution order of this code will process the right hand side first (and only once) and then assign keys of the resulting value to the left hand side.

TDZ


This means that the x to the right of in will be evaluated before the let x to the left of it. That means that x in this scope is not yet defined and still inside the so called "TDZ" (the Temporal Dead Zone).

In a nutshell, the tdz says variables declared as let or const will throw upon any interaction (read/write) until the actual line that declares them is executed.

Short example:

Code:
x = 5;
let x = 10;

This throws because x is TDZ'd. Same for const. But this would have worked with the "legacy" var statement.

Normalization


I've been working on Preval and in particular about normalization cases. The point of normalization is to make every line of code be as simple as possible, doing the least number of concrete atomic things possible.

In the case of the for-in loop, that would be for (x in y), so with a side-effect free identifier on either side. I haven't made up my mind yet whether that ought to be with or without a decl. I think with decl will end up being preferred but I'm not at that point yet.

I want to look at some examples of how we could simplify the for-in and for-of statements. I think they normalize roughly the same.

The base case looks like this:

Code:
let y = {};
for (let x in y) z();

// -->

let y = {};
for (let x in y) {
z();
}

What if there's any complexity? Like the lhs ("left hand side")? And yes, you can use properties like this.

Code:
let y = {};
for (f().x in y) z();

// -->

let y = {};
for (let tmp in y) {
f().x = tmp;
z();
}

The whole lhs is evaluated for every iteration so by pushing them inline we can keep the original semantics while simplifying the header. Other steps will further normalize that but I'll omit those steps for this post as they are not very relevant here.

What happens if the rhs ("right hand side") is complex?

Code:
let y = () => {};
for (let x in y()) z();

// -->

let y = () => {};
{
let tmp = y();
for (let x in tmp) {
z();
}
}

The rhs is evaluated once per loop so it goes outside of it.

The transform would add a block around it since the single for-in statement is now a statement and a decl. This way the original position of the for can be replaced and we can easily fold up nested blocks later.

This transform should retain the original execution order.

The transform for complex lhs and rhs are perfectly compatible:

Code:
let y = () => {};
for (f().x in y()) z();

// -->

let y = () => {};
{
let tmp = y();
for (let tmp2 in tmp) {
f().x = tmp2;
z();
}
}

So, we're done. ... Riiiight?

No. There are two things we need to fix here.

TDZ


Obviously, as the article started out, there is the TDZ.

Code:
let x = {};
for (let x in x) {}

This triggers a runtime error because the rhs of in is scoped to the for header, not the outer scope. It's a super edge case because I haven't been able to come up with a reasonably useful use case for this pattern at all. Not yet, anyways. I think you can something with a proxy and refer to the previous value of the iteration or whatever. But there are much simpler and better patterns to do the same thing so that's not "reasonable".

So if there's no reasonable reason to be using this kind of code, why would I want to care about it in Preval? Two reasons.

One: because it's a good obfuscation case to throw at tools like Preval and try to trip them up (or even unintentionally). Nasty bugs to debug.

And two: because I don't want this kind of problem to be around when I'm aware of them.

For the examples above the solution is not that hard. Keep in mind that the point here is to retain a TDZ error after transformation if one would occur in the original code.

Code:
let x = () => {};
for (let x in x()) f();

// -->

let x = {};
{
const tmp = x();
let x;
for (x in tmp) {
y();
}
}

Note that the outlining is wrapped in a block. This way the outlined let x is block scoped, does not clash with the global declaration, and allows us to retain the TDZ error. All the while I'm still able to normalize the complexity away from the for header. Cool cool.

Patterns


Ahhh, crap. Enter patterns:

Code:
let x = () => {};
for (let [x] in x()) f();

// -->

let x = () => {};
{
const tmp = x();
for (let tmp2 in tmp) {
let [x] = tmp2;
y();
}
}

Now we see why the shadowing was relevant to display all this time. In the example above there will not be a runtime error. Instead it calls the global x and completes without error. Oops. This is how exploits happen.

How do we remedy? We can try to outline the declaration rather than inline it, we like we did before with the identifier declaration.

Code:
let x = () => {};
for (let [x] in x()) f();

// -->

let x = () => {};
{
const tmp = x();
let [x];
for (let tmp2 in tmp) {
[x] = tmp2;
y();
}
}

But that's illegal because the pattern requires an initializer. And before you propose to throw an empty array or object at it, keep in mind that patterns can have any number of depth and can have initializers with observable side-effects. It's a no-go.

The only option is to collect all the bindings that a pattern defines and to then pull them out, declare them in our wrapper block like before, and change the decl to an inline assignment pattern (like above). The semantics should be the same afterwards.

Code:
let x = () => {};
for (let [x = g(), {y, ...z}] in x()) f();

// -->

let x = () => {};
{
const tmp = x();
let x, y, z;
for (let tmp2 in tmp) {
[x = g(), {y, ...z}] = tmp2;
y();
}
}

I believe that should do it.