The next version of JS, ES6, will ship with a couple of new gimmicks. One of them is the
yield
keyword. It's basically a tool you can use to pause a function mid-way, only to continue it at some later point. If you're thinking promises now, you are getting warm.
I was thinking about
yield
and how I would translate them. I think I've got a proper translation back to ES5 and wanted to share with you how to get there. If you're not too familiar with the concept yet, this might be an alternative way of learning it instead of those silly fibonacci examples I keep seeing everywhere :)
Let's start with a simple example. In it's most basic form,
yield
really just pauses a function for you to continue it later. You denote yielding functions with a star (
*
)
I'm not actually sure where exactly you're supposed to use star, but I know this is at least one case, sorry for being lazy there :)function* f() {
a();
yield;
b();
}
So you call
f()
, you get a new function back which is called an "iterator". In the current context that doesn't make sense though so let's just take it for what it is; an object with methods to start/continue running the function. The object will have five methods:
next
,
send
,
throw
,
close
, and
iterate
(
source). I'm not sure why
next
and
send
could not have been combined to just
next
with (arg arity check to catch
undefined
). I'm also not sure why it's called
next
over
continue
(seeing how there's already
throw
, it can't be the keyword argument) but whatever. You call
next
to start a generator function AND to continue it from a
yield
.
Okay, how do we translate the above into ES5 terms?
function f() {
var started = false;
var finished = false;
function begin() {
a();
}
function rest() {
b();
}
return {
next: function(){
if (!started) {
started = true;
begin();
} else if (!finished) {
rest();
finished = true;
} else {
throw 'nope';
}
}.
// rest of api
};
}
We simply split up the function into two parts; the code before
yield
and the remainder. We return an object with the iterator API upon calling the function. The first
next/send
call would invoke
begin
, the second would invoke
rest
, and any subsequent calls would throw an error.
This translation is far being a generic solution though.
For example, we can improve this a bit by handling the context too. The context of the original call to
f
(so not a method of its return value) determines the context of the function, regardless of yielding:
function* f() {
this.x = true;
yield;
this.y = true;
}
var obj = {};
var g = f.call(obj);
g.next();
g.next();
obj; // {x:true, y:true}
You might say generators implicitly bind their initial calling context to the returned iterator function (the one you actually call
next
and
send
).
function f() {
var that = this;
var started = false;
var finished = false;
function begin() {
this.x = true;
}
function rest() {
this.y = true;
}
return {
next: function(){
if (!started) {
started = true;
begin.call(that);
} else if (!finished) {
finished = true;
rest.call(that);
} else {
throw 'nope';
}
}.
// rest of api
};
}
var obj = {};
var g = f.call(obj);
g.next();
g.next();
obj; // {x:true, y:true}
Note that any variable that we create in the translation process would have to be named in such a way that collisions are very unlikely. Anyways, context persists regardless of
yield
.
Talking about variables, they are now cut up in their own scopes too. That could be a problem:
function* f() {
var x = 1;
yield;
console.log(x);
}
In order to fix this we have to "hoist" all variables to the main function before we cut the main function into pieces. Effectively we'll be creating closures (but we're doing that anyways, so whatevs):
function f() {
var started = false;
var finished = false;
var x;
function begin() {
x = 1;
}
function rest() {
console.log(x);
}
return {
next: function(){
if (!started) {
started = true;
begin();
} else if (!finished) {
rest();
finished = true;
} else {
throw 'nope';
}
}.
// rest of api
};
}
Next,
yield
can return a value:
function* f() {
var x = a();
yield x;
b();
return 15;
}
// or just:
function* f() {
yield a();
b();
}
So when you call the
next
method (or
send
) it will return a value. This is an object (
as far as I can tell) like
{ value, done }
. Note that a
return
in the original function will also return a value and you can't really seem to distinct between the two (except perhaps that
done
will be
true
now). Let's translate this example.
In the following steps I'll leave out previous steps to keep the examples small. I'll recombine everything in the end for a complete picture.
function a(){ return "foo"; }
function f() {
var started = false;
var finished = false;
function begin() {
// if yield has an expression rhs put it in the function
// that preceeds it and return it. otherwise dont return.
return a();
}
function rest() {
b();
return 15; // this is the original return value of `f`
}
return {
next: function(){
if (!started) {
started = true;
return begin();
}
if (!finished) {
finished = true;
return rest();
}
throw 'nope';
}.
// rest of api
};
var g = f();
g.next(); // "foo"
g.next(); // 15
g.next(); // throws error
}
Besides returning a value, it can also receive an argument by using the
send
method instead of
next
:
function* f() {
var x = yield;
console.log(x);
}
var g = f();
g.next(); // first call cannot use send
g.send(15); // 15 is logged
By now the translation should be fairly straightforward...
function f() {
var started = false;
var finished = false;
var x;
function begin() {
a();
}
function rest(arg) {
x = arg;
b();
}
return {
next: ...,
send: function(arg){
if (!started) {
started = true; // before or after the throw? not sure...
if (arguments.length) throw 'no can do';
begin();
} else if (!finished) {
finished = true;
rest(arg); // can be undefined, dont care
} else {
throw 'nope';
}
}.
// rest of api
};
}
Multiple yields follow pretty much the same structure, just slightly more generic...
function* f(){
a();
yield;
b();
yield;
c();
}
Add more handlers, either hardcode them by name or put them in an array. We'll need to track which
yield
exactly has been yielding so far so we'll put all the sliced up functions in an array instead. This way we can just use the "current yield" as a pointer to the array. A good prep for when we cover arrays...
function f() {
var NOT_RAN = -1;
var currentYield = NOT_RAN;
var parts = [
function part1() {
a();
},
function part2() {
b();
},
function part3() {
c();
},
// etc
];
var FINISHED = parts.length-1;
return {
next: function(){
if (currentYield === FINISHED) {
throw error;
}
++currentYield;
parts[currentYield]();
}.
// rest of api
};
}
In some way it's even more elegant than the initial translation. Note that
send
would have an extra check to make sure
currentYield
was not
NOT_RAN
.
The generator itself can also accept arguments. These aren't that much different from usual functions in our translation. They're already properly scoped, so they don't need hoisting. The only thing we might need to take care of is
arguments
, but even that is questionable since ES6 encourages the use of the "spread" operator, which will replace the whole
arguments
thing in due time. Anyways, we can just copy the
arguments
object and make all references in the original code refer to this alias instead...
function* f(){
var x = arguments.length;
yield;
console.log(x);
}
// =>
function f() {
var started = false;
var finished = false;
var x;
var _arguments = arguments;
function begin() {
x = _arguments;
}
function rest() {
console.log(x);
}
return {
next: function(){
if (!started) {
started = true;
begin();
} else if (!finished) {
rest();
finished = true;
} else {
throw 'nope';
}
}.
// rest of api
};
}
Note that this doesn't preserve the magic link between
arguments
and the function parameters, but we could even solve that (kind of beyond the scope of this post though).
So far we've been using examples that run once and complete immediately. What about using
yield
inside a loop?
function* f(){
var list = [4,3,5,13,5,33,4];
for (var i=0; i a(i);
yield list[ i];
b(list[ i]);
}
return list.length;
}
var g = f();
g.next(); // 4
g.next(); // 3
g.next(); // 5
// ... etc
g.next(); // 4
g.next(); // 7
g.next(); // throws error
This is basically an iterator, but that's not important. As you can see it will pause the loop mid-way. This shouldn't really surprise you at this point, but it does mean we have to change the way we set the "finished" variable, because entering the last
yield
should not cause
next()
to always throw...
First convert the
for
to a
while
, which is fairly trivial:
var list = [4,3,5,13,5,33,4];
for (var i=0; i a(i);
yield list[ i];
b(list[ i]);
}
return list.length;
// =>
var list = [4,3,5,13,5,33,4];
var i = 0;
while (i a(i);
yield list[ i];
b(list[ i]);
++i;
}
return list.length;
Now we can more easily cut up the loop:
function head() {
var list = [4,3,5,13,5,33,4];
var i = 0;
}
function loop() {
if (i<list.length) {
return body1();
} else {
return tail();
}
}
function body1() {
a(i);
return list[ i];
}
function body2() {
b(list[ i]);
++i;
loop();
}
function tail() {
return list.length;
}
I hope you can recognize the original logic in the rewrite. It's still the same loop, just cut up in functions. We need those functions because if the loop loops after yielding we don't want to execute the initialization part again as well. So we'll need one function for the code before the loop, and one function for the condition and first part of the loop. We replace the loop keyword (
while
) with an
if
because the actual looping is now done through yielding and repeatitively calling functions.
We could merge
body1
into
loop
, but I want the slicing to be clear for now.
Let's first make a custom implementation that deals with this transformation:
function f() {
var started = false;
var finished = false;
var i;
var list;
// functions from above, head, loop, body1, body2, tail
// `var` keywords are removed (hoisted above)
...
// tail has to be modified too:
function tail() {
finished = true; // otherwise loop() might finish without us noticing
return list.length;
}
return {
next: function(){
if (!started) {
started = true;
head();
return loop();
}
if (!finished) {
loop();
}
throw error;
}.
// rest of api
};
}
If
yield
was updating a variable,
loop
would have to be changed like the example above, to accept a parameter and store this in the closured variable. So I won't do that here.
The case for
for-in
is a bit more complex because of the iteration order and being unable to access something like
nextKey()
(See ES5 spec,
12.6.4 step 6.a), combined with the mutability of the object during a loop. However, IIRC the order of a
for-in
is preserved to the start of such loop, and only removed keys affect the loop in that they are not visited. New keys are not also visited. I don't think the order really changes (though the spec is weary on going into that at all).
Note that there are a few grey areas in terms of support. For example, the spec does not require whether or not new properties are also walked. It does not mention the case of removing and then adding a property again. In the approach below we won't walk new properties, but if a property is deleted and added back we'll visit it (and in the same order as it would have been without such actions, in case that it matters).
Anyways, we can probably transform that loop to a while like this:
for (var key in obj) ...
// =>
var keys = [];
for (var key in obj) keys.push(k);
while (key = key.shift()) if (key in obj) {
...
}
Doing a
for-in
first means we can preserve the order as the browser would otherwise have used it.
While
Object.keys
does preserve this order, it only returns
own properties.
Object.prototype.getOwnPropertyNames
also returns own properties, but also returns non-enumerable properties, which is not what
for-in
does at all. So no dice. We could traverse the prototype manually by using (and relying on)
__proto__
or
Object.getPrototypeOf(o)
, but we might as well loop it.
These loops were actually just the simple examples, of course. Can we make this generic for the more complex cases? Multiple loops, nested loops, multiple yields, etc. Sure we can! It just involves a bit more rewriting. Rather than hardcoding the
yield
induced function boundaries, we'd also log the index of the
yield
being invoked. That way we can continue from it when we continue the function and we won't have to worry about these kinds of loops:
function* f(n){
a();
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
n = yield n;
}
d();
}
e();
if (++n % 5) {
f();
n = yield n;
g();
}
h();
}
i();
}
I've added bogus function calls in between each step to make the rewrite process more obvious below.
The problem in the above loop is that starting from one
yield
does not guarantee jumping to the next
yield
, or any at all. But if we put all those functions in an array like we did a few examples ago, and have an index of the last invoked
yield
, we can just use that to fix all the things...
Note that this function can only be quit by calling .throw()
on it.Let's slice up this last example and translate it to a more generic approach... This time we won't cut up each loop into various head, body, tail parts but instead we'll define the function as if each yielding position was the start of the code. This winds down to repeating the remaining body of the loop and then calling a function the does the entire loop again. Since we don't have to worry about scopes, this should be fine.
So if a loop yields, a "continuation" function for it first has to invoke the remainder of the original body of that loop from that
yield
onwards, as well as the whole loop itself after it. It has to do the same for any loop that wraps it in the original function. More granular slicing could help prevent code duplication here. But we won't do that in our example.
function f(n) {
var finished = false;
var lastIndex = 0;
var parts = [
function(){ // start function. entire code because the while "might" be falsy
a();
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
}
// essentially dead code, but our transformation won't be smart about it
i();
finished = true;
},
function(newN){ // what the original code would execute from the first yield onward
// update n for doing send()
n = newN;
// remainder of inner loop
d();
// (entire) inner loop
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
// remainder outer loop
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
// entire outer loop
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
}
// dead code, again
i();
finished = true;
},
function(newN){ // what the original code would execute from the second yield onward
// update n for doing send()
n = newN;
// remainder of outer loop
g();
h();
// outer loop again
while (true) {
b();
while (n % 13 > 6) {
c();
if (++n % 3) {
lastIndex = 1;
return n;
}
d();
}
e();
if (++n % 5) {
f();
lastIndex = 2;
return n;
g();
}
h();
}
// essentially dead code, but our transformation won't be smart about it
i();
finished = true;
}
];
return {
next: function(){
if (finished) {
throw error;
}
return parts[lastIndex]();
},
send: function(n){
if (!lastIndex) {
throw error;
}
if (finished) {
throw error;
}
return parts[lastIndex](n);
},
// rest of api
};
}
Initially it will run the first function of the
parts
array, which essentially is the original function, with
yield
occurences changed.
If it ever reaches the first
yield
, it will remember that by setting the
lastIndex
to
1
. When the function is continued it will invoke
parts[lastIndex]
, which will continue running the code of that function. Since it's a loop, it would have to repeat the loop as well. This all the way to the outer most loop. It's hard to abstract this because you'd have to make a difference between the end of a sub-abstraction, and the explicit
return
for a
yield
.
Anyways, that's not to say you can't be smarter about this. It's just not as trivial as it might seem :)
yield
is actually not a statement, but it's an expression. Kind of a weird choice in my humble oppinion, but that's how it is. This means you can
yield
in any expression, returning a value with it and passing on a value to it. So basically it's an operator, unlike any operator we've had so far.
function* f(x){
console.log(yield x);
}
This translation falls well in line with the above rewrites though. If nothing else, we can use temporary "anonymous" variables for them.
I've already decided that I'll mark the
yield
yield keyword in a clear bright outstanding color in any syntax highlighting in the future. Simply because it'll have that much impact and the extra star to the
function
keyword won't suffice.
I'm not even sure why that star is important to have, or what it solves. Tools can statically determine that a function will yield, so it's for humans. But in larger functions, the yield can still be "hidden" in the code. The star feels like useless overhead to me. But I've missed the discussions covering it so perhaps there is a compelling reason for it.
Okay I think that's about it. We've covered
yield
, multiple yields, context, variable scoping, returning a value with
yield
, passing a value to
yield
, loops, and the complex nested loop case. Let's put it all together in a convoluted example :)
function* f(n){
this.a();
while (yield n) {
var x = this.b(n);
while (x % 13 > 6) {
this.c();
if (++n % 3) {
n = yield n;
}
this.d();
}
this.e();
if (++n % 5) {
this.f();
n = yield n;
this.g();
n = yield n;
this.h();
}
this.i();
}
this.j();
return 'ended';
}
This example sports a couple of yields, function argument, context, local variable, loops, nested loops, yield expression, yield argument, and yield value. Let's translate it into an uninterpretable piece of junk!
function f(n) {
var finished = false;
var that = this;
var lastIndex = 0;
var x;
var parts = [
function(){
this.a();
lastIndex = 1;
return n;
},
function f1(tmp){
if (tmp) {
var x = this.b(n);
while (x % 13 > 6) {
this.c();
if (++n % 3) {
lastIndex = 2;
return n;
}
this.d();
}
this.e();
if (++n % 5) {
this.f();
lastIndex = 3;
return n;
}
this.i();
// while (yield n)
lastIndex = 1;
return n;
}
this.j();
finished = true;
return 'ended';
},
function f2(){
this.d();
while (x % 13 > 6) {
this.c();
if (++n % 3) {
lastIndex = 2;
return n;
}
this.d();
}
this.e();
if (++n % 5) {
this.f();
lastIndex = 3;
return n;
}
this.i();
// while (yield n)
lastIndex = 1;
return n;
},
function f3(newN){
n = newN;
this.g();
lastIndex = 4;
return n;
},
function f4(newN){
n = newN;
this.h();
this.i();
// while (yield n)
lastIndex = 1;
return n;
},
];
return {
next: function(){
if (finished) {
throw error;
}
return parts[lastIndex].call(that);
},
send: function(x){
if (!lastIndex) {
throw error; // not yet started
}
if (finished) {
throw error;
}
return parts[lastIndex].call(that, x);
},
};
}
Manual labour, errors might be contained. But that's the main gist of it.
I can probably transform this into a library to do this for me. I'm aware that there are a couple of libraries out there that already do this, as well as environments where you can run ES6 code. However, I just wanted to do this myself because a: it's an interesting challenge, and b: it's a good learning experience to try and cover all bases. I learned a lot about
yield
today just by writing this blog post.
I do wonder whether we don't also need a kind of
yield
that transgresses the call stack. For example, I'm trying to think of a good way to "pause" my own JS parser, ZeParser, and turn it into a streaming parser. For that to work, I would have to be able to pause the tokenizer at the point where it wants to read the next token and it's missing input. Instead of throwing an error it would simply yield. However, that would not be sufficient because I'd have to patch all the possible call stacks that lead up to this point in a similar way. This would have such a huge impact to the code that I'm not likely to take that path. Unless I can automate it, of course. I think you can write some simple logic to make a yield trickle through to some function and automatically write in the required code.
But it would be nice if we just had some way to define like a "yield fence", which is pretty much like a
try-catch
for yielding. Then rather than yielding just one function, it would yield all the functions up to this fence. Or in my case, it would freeze the entire parser until the network loaded more bytes of the script so the parser can continue.
None of these code snippets have actually been tested. I currently intend to write a small lib to do this transformation automatically and some fiddles or whatever to play with them. It might never happen so don't hold your breath :)
Oh and I realized that it probably won't work with some of the more irky constructs like, of course,
eval
and
with
. I can work around them but that too is way outside the scope of this blog post :)
Ok. So this was fun. Hope it was learnful/helpful/interesting to you too :)