While the new and upcoming ES6 specification will bring the
yield
keyword to the JavaScript landscape, this new feature doesn't actually enable something that can't already be done. After speculating about how I would do make the translation, well, I did. And it took a slightly different approach, of course.
I've dubbed it
UnYield. Here are some links:
The demo:
http://unyield.qfox.nl/The integration tests:
http://unyield.qfox.nlGithub:
http://github.com/qfox/unyieldSo let me explain how it actually does its job...
First the boilerplate. The
function* g(){}
becomes a regular function like this:
function g() {
var _γ_that = this;
var _γ_ielded = true;
var _γ_nextId = 0;
var _γ_ieldValue_N; // one var for each occurrence of yield
// any local variables are hoisted here too
var _γ_funcs = [
function _γ_f0(){ ... }
];
// one function for each occurrence yield
_γ_funcs[1] = function _γ_f1(_γ_ieldValue){ ... };
return {
next: function(v){
if (arguments.length > 1) throw "next() only accepts zero or one arguments...";
if (!_γ_ielded) throw 'unable to next, iterator finished';
return {value:_γ_funcs[_γ_nextId].call(_γ_that, v), _γ_ielded: _γ_ielded,_γ_nextId: _γ_nextId,done:!_γ_ielded};
},
};
}
That's a lot of boilerplate. Let's go over it.
First of all, note that
_γ_
is just a configurable prefix. It's an attempt to prevent variable collisions, obviously.
var _γ_that = this;
The context of the iterator is and stays whatever it was when you called the generator function. So if you call
x.g();
, whenever you call
.next()
the context will be
x
. We store the context in
_γ_that
to preserve it.
var _γ_ielded = true;
var _γ_nextId = 0;
We need to track whether the function yielded, and if so, which occurrence of
yield
caused it. We need to know because we need to return a value for
done
. We need to know which
yield
becase we need to continue from it.
var _γ_ieldValue_1, _γ_ieldValue_1, _γ_ieldValue_1, _γ_ieldValue_...;
These contain the return value for a
yield
, obviously. Initally I used one variable because that seemed to suffice. But then there's the case of for example
(yield 1) + (yield 2)
, where you'll need to do two returns and then have to remember both return values. In hindsight I didn't support multiple yields in the same expression, so perhaps I could drop this part for UnYield now.
var _γ_funcs = [
function _γ_f0(){ ... }
];
I use
_γ_funcs
like a jump table. When the code code encounters a
yield
it actually returns instead. It remembers that it returned from that specific
yield
and uses that id to continue from the right function when you call
.next()
. Hence the jump table.
The root function is
f0
. It's dumbly the original function, except any statement that contained
yield
is now completely replaced by a
return
. Other than that, no funny stuff. It's possible that the original code never hits
yield
and the function will need to return regularly if that's the case. However, in most cases, any occurrence of
yield
will result in dead code :) Which is fine. All in all,
f0
will be the closest resemblence to the original function.
_γ_funcs[1] = function _γ_f1(_γ_ieldValue){ ... };
These are the continuation functions, one for each
yield
. In a nutshell they just contain the code starting at the statement that contained
yield
. This is prefixed by some idle boilerplate to prevent certain syntax errors. In case of loops, the code is followed by each loop, recursively so to the top. I'll get back to that in a sec :)
return {
next: function(v){
if (arguments.length > 1) throw "next() only accepts zero or one arguments...";
if (!_γ_ielded) throw 'unable to next, iterator finished';
return {value:_γ_funcs[_γ_nextId].call(_γ_that, v), _γ_ielded: _γ_ielded,_γ_nextId: _γ_nextId,done:!_γ_ielded};
},
};
This is the iterator API. Well, the important part of it. There's also a
.throw()
method I didn't do. But it's kind of trivial to add if you need it.
Note that my
yield
doesn't require arguments, Kyle told me the spec does require it. But that just means my approach is overaccepting. And that is fine :) But the tests will mostly not have arguments. Again, that should be fine too as they're in those cases not testing the arguments. Ok whatever, let's move on.
The first thing that UnYield will do is normalize all loops to regular
while
loops. This is a trivial operation for all the loops in JavaScript except the
for-in
. Though theoretically impossible to transform into a
while
, it can be done tentively to work mostly as intended. Yeah okay that sounds silly. Let's just look at how it's transformed:
for (var key in foo) code;
// becomes =>
var key;
var tmp, keys = [], expr = foo;
for (tmp in expr) keys.push(tmp);
while (keys.length) {
if (keys[0] in expr) { // dont iterate deleted props
key = keys.shift(); // postpone assignment to prevent clobbering the previous value
code;
}
}
// actual =>
var key ;
var _γ_key, _γ_keys_6 = [], _γ_expro_6 = ( foo);
for (_γ_key in _γ_expro_6) _γ_keys_6.push(_γ_key);
while (_γ_keys_6.length)
if (_γ_keys_6[0] in _γ_expro_6 && !void( key = _γ_keys_6.shift()))
code;
As you can see it gathers all properties first, in the same order as
for-in
returns them. Then it becomes a regular loop which we can safely
yield
. Caveats in this approach are: If a key is deleted and re-added it is still visited but shouldn't be, if the object contains ES5 getters, this transformation is "observable" since the getter will be called before actually looping, it's essentially a memory leak as long as the function does not complete. The memory leak is a potential problem, the rest are just edge cases imho.
The variant without the
var
keyword is similar, except the variable isn't explicitly declared.
As you can see, in the actual result I kind of hack the assignment of the key to the original variable. This prevents the requirement of having to wrap the sub-statement in curly braces. Each variable gets its own index to prevent nested
for-in
s (or closures) to clash. The wrapping of the rhs (
foo
above) is perhaps premature, but it is possible to have a comma there and that would trigger a syntax error.
Let's build the continuation functions, these are actually kind of interesting. They do what the name implies, it continues the original function by executing only the code that would come after the
yield
. There's no real jump/goto in JavaScript so I simply construct a bunch of functions. In hindsight I probably could have made due with a huge
switch
, which would have prevented the code duplication. I noticed Regenerator does it that way and it reminds me of how we did
goto
's in the AS3 bytecode VM for Uxebu (
there the
switch
actually was my idea, so yeah :)).
Since I cut the original function up in several functions I might also cut away the head of a statement. This means I'll have to mend those cuts.
For most statements this is not a problem. Things like
return
,
throw
, etc. which don't have sub-statements. They are either replaced in full by the return stuff or the yield expression turns into a variable holding the return value.
The
if
is fairly easy, though a simple
{
won't mend it completely because the
else
will need the proper
if
. So if the
yield
is in an
if
, it will be preceeded by
if (true)
. This way
else
will never throw a syntax error.
The
switch
statement is only more complex by having more boilerplate. To make sure
case
s,
default
, and
break
s in the original
switch
don't start throwing syntax errors or break to the wrong statement we do two things: We create a new variable to which we assign a fresh object. We only care about the unique reference of this object. Second, we'll prefix the code with
switch (switchVar) { case switchVar:
. This way the remaining code will not cause syntax errors because it's still in a (dud)
switch
.
When yielding inside a
try
we simply prefix the continuation function with
try {
. However, inside a
catch
we only prefix a
{
. This is not generically correct because it will break
finally
, but I had already decided not to bother with
finally
anyways. It's not too difficult but adds a bit of boilerplate to detect a
yield
vs a
return
or
throw
to prevent triggering the
finally
. Not impossible, just didn't do it. I prefix
catch
with only a
{
because it prevents me from having to
throw
, or something even more elaborate. Note that my approach also breaks accessing the catch variable in the continuation function, because it disappears.
If you wanted to make sure a
yield
inside the
catch
of a
try/catch/finally
doesn't throw a syntax error, you could prefix the whole thing with
try { throw throwValue_1; } catch (catchVar_1) { var e = catchVar_1;
, where
throwValue_1
was assigned the catch variable. This would also fix accessing the catch variable ;) This isn't exactly the same thing due to catch-scoping works in JavaScript, but close enough.
A block, those are
just curly braces, are in fact prefixed with a
{
. Probably the easiest to fix up :)
That leaves
with
and
while
. Remember that I've already normalized
for
and
do-while
to regular
while
s at this point.
Obviously
with
needs to be patched up using the original object. The original occurrence of
with
is transformed to assign it's parameters to this fresh variable.
with (x+y+z) ...
becomes
with (withVar = (x+y+z)) ...
. Then in the continuation functions we just prefix a
with (withVar)
and are done.
The most annoying in this approach is a loop. If you start mid-way a loop you'll have to continue running all the statements in that loop. Then you have to include the entire loop again, in case you ... loop. This can cause a lot of dumb repetition. Ironically, we'll prefix all loop continuations with a
for
. I make use of the three parts of the
for
so that I won't have to wrap the body in an extra layer of curlies.
for (var whileOnce = true, broke = false; whileOnce; whileOnce = false)
Yeah, as you can see, it makes sure the loop can be entered but doesn't ever repeat. At the end of the body, I'll slice out the entire loop from
f0
and copy it here. If the loop is nested, the same will be repeated at the end of the outer loop.
while (a) while (b) yield x;
// =>
for (var whileOnce = true, broke = false; whileOnce; whileOnce = false)
yieldValue;
while (b)
return x;
while (a)
while (b)
return x;
I noticed later that Regenerator uses a
switch
to work around this problem. It prevents this code duplication by wrapping the original code in an infinite loop and switch to the right points in the code at the end of a loop or when continuing after a yield. I like it. I didn't go for it because I thought it'd be easier without a
switch
. Note sure if that's true :)
Hmmm I think that's about it for statements.
Initially, UnYield finds all occurrences of
yield
in the input. It will slice out the keyword and it's argument and remember two values; a string that represents the return statement for this
yield
when it actually causes a yield, and a string representing the given yield value when continueing from yielding. When yielding, the entire statement is replaced with the return statement, otherwise only the yield expression is replaced with the yield variable.
The replacement return statement does three things:
return (yielded=true, nextId = 1, <value>);
It will set
yielded
to
true
. This identifies this returning as a yield, and not the end of the function. In short, this determines the
done
value.
The next id is set to the id of the
yield
that's causing it. When
.next()
is called this continuation function is invoked.
Lastly, the yield value is returned so that we can return it as the
value
property.
There's two special cases left to discuss:
break
and
continue
. These are context bound keywords.
break
can only appear within a
switch
,
loop
, or with a label within that label.
continue
can actually only appear in a loop, regardless of label. In the continuation functions I need to know whether the loop was shorted by either keyword. And I'll need to know whether to invoke the loop again of course. Doing
break
was kind of easy now since I'm prefixing everything with a dud loop. However,
continue
posed a small problem now because I did not want to actually continue the
current loop, but the loop that would need to follow. So instead I'll break anyways and simply not set the
broke
var to
true
. That way the current loop is broken, but the original loop which follows will still be entered.
At the beginning of each continuation function the
yielded
variable is set to
false
. So if there was no yielding, this var stays
false
and I'll know that it actually ended.
Mmmm yeah that's about it.
It was fun to create this demo. Learned a great deal about generators and
yield
, that can never be a bad thing. Maybe I'll translate some of the other ES6 features into ES5 semantics as well, as a good exercise.
The demo: http://unyield.qfox.nl/
The integration tests: http://unyield.qfox.nl
Github: http://github.com/qfox/unyield