Translating yield to ES5

2014-02-16

The next version of JS, ES6, will ship with a couple of new gimmicks. One of them is the yield keyword. It's basically a tool you can use to pause a function mid-way, only to continue it at some later point. If you're thinking promises now, you are getting warm.

I was thinking about yield and how I would translate them. I think I've got a proper translation back to ES5 and wanted to share with you how to get there. If you're not too familiar with the concept yet, this might be an alternative way of learning it instead of those silly fibonacci examples I keep seeing everywhere :)

Let's start with a simple example. In it's most basic form, yield really just pauses a function for you to continue it later. You denote yielding functions with a star (*) I'm not actually sure where exactly you're supposed to use star, but I know this is at least one case, sorry for being lazy there :)

Code:
function* f() {
a();
yield;
b();
}

So you call f(), you get a new function back which is called an "iterator". In the current context that doesn't make sense though so let's just take it for what it is; an object with methods to start/continue running the function. The object will have five methods: next, send, throw, close, and iterate (source). I'm not sure why next and send could not have been combined to just next with (arg arity check to catch undefined). I'm also not sure why it's called next over continue (seeing how there's already throw, it can't be the keyword argument) but whatever. You call next to start a generator function AND to continue it from a yield.

Okay, how do we translate the above into ES5 terms?

Code:
function f() {
var started = false;
var finished = false;
function begin() {
a();
}
function rest() {
b();
}
return {
next: function(){
if (!started) {
started = true;
begin();
} else if (!finished) {
rest();
finished = true;
} else {
throw 'nope';
}
}.
// rest of api
};
}

We simply split up the function into two parts; the code before yield and the remainder. We return an object with the iterator API upon calling the function. The first next/send call would invoke begin, the second would invoke rest, and any subsequent calls would throw an error.

This translation is far being a generic solution though.

For example, we can improve this a bit by handling the context too. The context of the original call to f (so not a method of its return value) determines the context of the function, regardless of yielding:

Code:
function* f() {
this.x = true;
yield;
this.y = true;
}
var obj = {};
var g = f.call(obj);
g.next();
g.next();
obj; // {x:true, y:true}

You might say generators implicitly bind their initial calling context to the returned iterator function (the one you actually call next and send).

Code:
function f() {
var that = this;
var started = false;
var finished = false;
function begin() {
this.x = true;
}
function rest() {
this.y = true;
}
return {
next: function(){
if (!started) {
started = true;
begin.call(that);
} else if (!finished) {
finished = true;
rest.call(that);
} else {
throw 'nope';
}
}.
// rest of api
};
}
var obj = {};
var g = f.call(obj);
g.next();
g.next();
obj; // {x:true, y:true}

Note that any variable that we create in the translation process would have to be named in such a way that collisions are very unlikely. Anyways, context persists regardless of yield.

Talking about variables, they are now cut up in their own scopes too. That could be a problem:

Code:
function* f() {
var x = 1;
yield;
console.log(x);
}

In order to fix this we have to "hoist" all variables to the main function before we cut the main function into pieces. Effectively we'll be creating closures (but we're doing that anyways, so whatevs):

Code:
function f() {
var started = false;
var finished = false;
var x;
function begin() {
x = 1;
}
function rest() {
console.log(x);
}
return {
next: function(){
if (!started) {
started = true;
begin();
} else if (!finished) {
rest();
finished = true;
} else {
throw 'nope';
}
}.
// rest of api
};
}

Next, yield can return a value:

Code:
function* f() {
var x = a();
yield x;
b();
return 15;
}

// or just:

function* f() {
yield a();
b();
}

So when you call the next method (or send) it will return a value. This is an object (as far as I can tell) like { value, done }. Note that a return in the original function will also return a value and you can't really seem to distinct between the two (except perhaps that done will be true now). Let's translate this example.

In the following steps I'll leave out previous steps to keep the examples small. I'll recombine everything in the end for a complete picture.

Code:

function a(){ return "foo"; }
function f() {
var started = false;
var finished = false;
function begin() {
// if yield has an expression rhs put it in the function
// that preceeds it and return it. otherwise dont return.
return a();
}
function rest() {
b();
return 15; // this is the original return value of `f`
}
return {
next: function(){
if (!started) {
started = true;
return begin();
}

if (!finished) {
finished = true;
return rest();
}

throw 'nope';
}.
// rest of api
};

var g = f();
g.next(); // "foo"
g.next(); // 15
g.next(); // throws error
}

Besides returning a value, it can also receive an argument by using the send method instead of next:

Code:
function* f() {
var x = yield;
console.log(x);
}
var g = f();
g.next(); // first call cannot use send
g.send(15); // 15 is logged

By now the translation should be fairly straightforward...

Code:
function f() {
var started = false;
var finished = false;
var x;
function begin() {
a();
}
function rest(arg) {
x = arg;
b();
}
return {
next: ...,
send: function(arg){
if (!started) {
started = true; // before or after the throw? not sure...
if (arguments.length) throw 'no can do';
begin();
} else if (!finished) {
finished = true;
rest(arg); // can be undefined, dont care
} else {
throw 'nope';
}
}.
// rest of api
};
}

Multiple yields follow pretty much the same structure, just slightly more generic...

Code:
function* f(){
a();
yield;
b();
yield;
c();
}

Add more handlers, either hardcode them by name or put them in an array. We'll need to track which yield exactly has been yielding so far so we'll put all the sliced up functions in an array instead. This way we can just use the "current yield" as a pointer to the array. A good prep for when we cover arrays...

Code:
function f() {
var NOT_RAN = -1;
var currentYield = NOT_RAN;
var parts = [
function part1() {
a();
},
function part2() {
b();
},
function part3() {
c();
},
// etc
];
var FINISHED = parts.length-1;

return {
next: function(){
if (currentYield === FINISHED) {
throw error;
}
++currentYield;
parts[currentYield]();
}.
// rest of api
};
}

In some way it's even more elegant than the initial translation. Note that send would have an extra check to make sure currentYield was not NOT_RAN.

The generator itself can also accept arguments. These aren't that much different from usual functions in our translation. They're already properly scoped, so they don't need hoisting. The only thing we might need to take care of is arguments, but even that is questionable since ES6 encourages the use of the "spread" operator, which will replace the whole arguments thing in due time. Anyways, we can just copy the arguments object and make all references in the original code refer to this alias instead...

Code:
function* f(){
var x = arguments.length;
yield;
console.log(x);
}

// =>

function f() {
var started = false;
var finished = false;
var x;
var _arguments = arguments;
function begin() {
x = _arguments;
}
function rest() {
console.log(x);
}
return {
next: function(){
if (!started) {
started = true;
begin();
} else if (!finished) {
rest();
finished = true;
} else {
throw 'nope';
}
}.
// rest of api
};
}

Note that this doesn't preserve the magic link between arguments and the function parameters, but we could even solve that (kind of beyond the scope of this post though).

So far we've been using examples that run once and complete immediately. What about using yield inside a loop?

Code:
function* f(){
var list = [4,3,5,13,5,33,4];
for (var i=0; i a(i);
yield list[ i];
b(list[ i]);
}
return list.length;
}
var g = f();
g.next(); // 4
g.next(); // 3
g.next(); // 5
// ... etc
g.next(); // 4
g.next(); // 7
g.next(); // throws error

This is basically an iterator, but that's not important. As you can see it will pause the loop mid-way. This shouldn't really surprise you at this point, but it does mean we have to change the way we set the "finished" variable, because entering the last yield should not cause next() to always throw...

First convert the for to a while, which is fairly trivial:
Code:
var list = [4,3,5,13,5,33,4];
for (var i=0; i a(i);
yield list[ i];
b(list[ i]);
}
return list.length;

// =>

var list = [4,3,5,13,5,33,4];
var i = 0;
while (i a(i);
yield list[ i];
b(list[ i]);
++i;
}
return list.length;

Now we can more easily cut up the loop:

Code:
function head() {
var list = [4,3,5,13,5,33,4];
var i = 0;
}
function loop() {
if (i<list.length) {
return body1();
} else {
return tail();
}
}
function body1() {
a(i);
return list[ i];
}
function body2() {
b(list[ i]);
++i;
loop();
}
function tail() {
return list.length;
}

I hope you can recognize the original logic in the rewrite. It's still the same loop, just cut up in functions. We need those functions because if the loop loops after yielding we don't want to execute the initialization part again as well. So we'll need one function for the code before the loop, and one function for the condition and first part of the loop. We replace the loop keyword (while) with an if because the actual looping is now done through yielding and repeatitively calling functions.

We could merge body1 into loop, but I want the slicing to be clear for now.

Let's first make a custom implementation that deals with this transformation:

Code:
function f() {
var started = false;
var finished = false;
var i;
var list;

// functions from above, head, loop, body1, body2, tail
// `var` keywords are removed (hoisted above)
...
// tail has to be modified too:
function tail() {
finished = true; // otherwise loop() might finish without us noticing
return list.length;
}

return {
next: function(){
if (!started) {
started = true;
head();
return loop();
}
if (!finished) {
loop();
}

throw error;
}.
// rest of api
};
}

If yield was updating a variable, loop would have to be changed like the example above, to accept a parameter and store this in the closured variable. So I won't do that here.

The case for for-in is a bit more complex because of the iteration order and being unable to access something like nextKey() (See ES5 spec, 12.6.4 step 6.a), combined with the mutability of the object during a loop. However, IIRC the order of a for-in is preserved to the start of such loop, and only removed keys affect the loop in that they are not visited. New keys are not also visited. I don't think the order really changes (though the spec is weary on going into that at all).

Note that there are a few grey areas in terms of support. For example, the spec does not require whether or not new properties are also walked. It does not mention the case of removing and then adding a property again. In the approach below we won't walk new properties, but if a property is deleted and added back we'll visit it (and in the same order as it would have been without such actions, in case that it matters).

Anyways, we can probably transform that loop to a while like this:

Code:
for (var key in obj) ...

// =>

var keys = [];
for (var key in obj) keys.push(k);
while (key = key.shift()) if (key in obj) {
...
}

Doing a for-in first means we can preserve the order as the browser would otherwise have used it.

While Object.keys does preserve this order, it only returns own properties. Object.prototype.getOwnPropertyNames also returns own properties, but also returns non-enumerable properties, which is not what for-in does at all. So no dice. We could traverse the prototype manually by using (and relying on) __proto__ or Object.getPrototypeOf(o), but we might as well loop it.

These loops were actually just the simple examples, of course. Can we make this generic for the more complex cases? Multiple loops, nested loops, multiple yields, etc. Sure we can! It just involves a bit more rewriting. Rather than hardcoding the yield induced function boundaries, we'd also log the index of the yield being invoked. That way we can continue from it when we continue the function and we won't have to worry about these kinds of loops:

Code:
function* f(n){
a();
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
n = yield n;
}
d();
}
e();
if (++n % 5) {
f();
n = yield n;
g();
}
h();
}
i();
}

I've added bogus function calls in between each step to make the rewrite process more obvious below.

The problem in the above loop is that starting from one yield does not guarantee jumping to the next yield, or any at all. But if we put all those functions in an array like we did a few examples ago, and have an index of the last invoked yield, we can just use that to fix all the things...

Note that this function can only be quit by calling .throw() on it.

Let's slice up this last example and translate it to a more generic approach... This time we won't cut up each loop into various head, body, tail parts but instead we'll define the function as if each yielding position was the start of the code. This winds down to repeating the remaining body of the loop and then calling a function the does the entire loop again. Since we don't have to worry about scopes, this should be fine.

So if a loop yields, a "continuation" function for it first has to invoke the remainder of the original body of that loop from that yield onwards, as well as the whole loop itself after it. It has to do the same for any loop that wraps it in the original function. More granular slicing could help prevent code duplication here. But we won't do that in our example.

Code:
function f(n) {
var finished = false;
var lastIndex = 0;
var parts = [
function(){ // start function. entire code because the while "might" be falsy
a();
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
}
// essentially dead code, but our transformation won't be smart about it
i();
finished = true;
},
function(newN){ // what the original code would execute from the first yield onward
// update n for doing send()
n = newN;
// remainder of inner loop
d();
// (entire) inner loop
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
// remainder outer loop
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
// entire outer loop
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
}
// dead code, again
i();
finished = true;
},
function(newN){ // what the original code would execute from the second yield onward
// update n for doing send()
n = newN;
// remainder of outer loop
g();
h();
// outer loop again
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
}
// essentially dead code, but our transformation won't be smart about it
i();
finished = true;
}
];

return {
next: function(){
if (finished) {
throw error;
}

return parts[lastIndex]();
},
send: function(n){
if (!lastIndex) {
throw error;
}
if (finished) {
throw error;
}

return parts[lastIndex](n);
},
// rest of api
};
}

Initially it will run the first function of the parts array, which essentially is the original function, with yield occurences changed.

If it ever reaches the first yield, it will remember that by setting the lastIndex to 1. When the function is continued it will invoke parts[lastIndex], which will continue running the code of that function. Since it's a loop, it would have to repeat the loop as well. This all the way to the outer most loop. It's hard to abstract this because you'd have to make a difference between the end of a sub-abstraction, and the explicit return for a yield.

Anyways, that's not to say you can't be smarter about this. It's just not as trivial as it might seem :)

yield is actually not a statement, but it's an expression. Kind of a weird choice in my humble oppinion, but that's how it is. This means you can yield in any expression, returning a value with it and passing on a value to it. So basically it's an operator, unlike any operator we've had so far.

Code:
function* f(x){
console.log(yield x);
}

This translation falls well in line with the above rewrites though. If nothing else, we can use temporary "anonymous" variables for them.

I've already decided that I'll mark the yield yield keyword in a clear bright outstanding color in any syntax highlighting in the future. Simply because it'll have that much impact and the extra star to the function keyword won't suffice.

I'm not even sure why that star is important to have, or what it solves. Tools can statically determine that a function will yield, so it's for humans. But in larger functions, the yield can still be "hidden" in the code. The star feels like useless overhead to me. But I've missed the discussions covering it so perhaps there is a compelling reason for it.

Okay I think that's about it. We've covered yield, multiple yields, context, variable scoping, returning a value with yield, passing a value to yield, loops, and the complex nested loop case. Let's put it all together in a convoluted example :)

Code:
function* f(n){
this.a();
while (yield n) {
var x = this.b(n);
while (x % 13 > 6) {
this.c();
if (++n % 3) {
n = yield n;
}
this.d();
}
this.e();
if (++n % 5) {
this.f();
n = yield n;
this.g();
n = yield n;
this.h();
}
this.i();
}
this.j();
return 'ended';
}

This example sports a couple of yields, function argument, context, local variable, loops, nested loops, yield expression, yield argument, and yield value. Let's translate it into an uninterpretable piece of junk!

Code:
function f(n) {
var finished = false;
var that = this;
var lastIndex = 0;

var x;
var parts = [
function(){
this.a();
lastIndex = 1;
return n;
},
function f1(tmp){
if (tmp) {
var x = this.b(n);
while (x % 13 > 6) {
this.c();
if (++n % 3) {
lastIndex = 2;
return n;
}
this.d();
}
this.e();
if (++n % 5) {
this.f();
lastIndex = 3;
return n;
}
this.i();

// while (yield n)
lastIndex = 1;
return n;
}

this.j();
finished = true;
return 'ended';
},
function f2(){
this.d();
while (x % 13 > 6) {
this.c();
if (++n % 3) {
lastIndex = 2;
return n;
}
this.d();
}
this.e();
if (++n % 5) {
this.f();
lastIndex = 3;
return n;
}
this.i();

// while (yield n)
lastIndex = 1;
return n;
},
function f3(newN){
n = newN;
this.g();
lastIndex = 4;
return n;
},
function f4(newN){
n = newN;
this.h();
this.i();
// while (yield n)
lastIndex = 1;
return n;
},
];

return {
next: function(){
if (finished) {
throw error;
}

return parts[lastIndex].call(that);
},
send: function(x){
if (!lastIndex) {
throw error; // not yet started
}
if (finished) {
throw error;
}
return parts[lastIndex].call(that, x);
},
};
}

Manual labour, errors might be contained. But that's the main gist of it.

I can probably transform this into a library to do this for me. I'm aware that there are a couple of libraries out there that already do this, as well as environments where you can run ES6 code. However, I just wanted to do this myself because a: it's an interesting challenge, and b: it's a good learning experience to try and cover all bases. I learned a lot about yield today just by writing this blog post.

I do wonder whether we don't also need a kind of yield that transgresses the call stack. For example, I'm trying to think of a good way to "pause" my own JS parser, ZeParser, and turn it into a streaming parser. For that to work, I would have to be able to pause the tokenizer at the point where it wants to read the next token and it's missing input. Instead of throwing an error it would simply yield. However, that would not be sufficient because I'd have to patch all the possible call stacks that lead up to this point in a similar way. This would have such a huge impact to the code that I'm not likely to take that path. Unless I can automate it, of course. I think you can write some simple logic to make a yield trickle through to some function and automatically write in the required code.

But it would be nice if we just had some way to define like a "yield fence", which is pretty much like a try-catch for yielding. Then rather than yielding just one function, it would yield all the functions up to this fence. Or in my case, it would freeze the entire parser until the network loaded more bytes of the script so the parser can continue.

None of these code snippets have actually been tested. I currently intend to write a small lib to do this transformation automatically and some fiddles or whatever to play with them. It might never happen so don't hold your breath :)

Oh and I realized that it probably won't work with some of the more irky constructs like, of course, eval and with. I can work around them but that too is way outside the scope of this blog post :)

Ok. So this was fun. Hope it was learnful/helpful/interesting to you too :)