Nope, not a blog post on the tabs-vs-spaces war. This is a blog post about my demo for this year's js1k!
Alright. So I'm writing a demo for js1k. The first time my entry was
a brainfuck interpreter. It was more of a joke than anything else, but I still had fun. This time I figured I'd target another language. My entries can't compete for a prize anyways, so instead of fancy, I just want to tackle a fun language and see if I can get it under 1k. Oboy!
As you might have guessed, this time I picked
Whitespace. The language with three significant characters, all in the spectrum of whitespace. Whitespace was actually an April fools joke. But as it goes with these things, it stuck around. I doubt anybody uses it for real, but it is Turing complete so theoretically it's as sound as JS or C. Probably just not as efficient :p
You can find the demo
here.
What surprised me most about the language is the fact that it's a compiled language. A fact that's not listed on the
formal website anywhere I might add. I had written a big part of the interpreter when I suddenly questioned the mechanics of a jump forward. How do you jump to a forward label if the language is interpreted. ... MEH. Luckily the compilation step turned out to be a relatively simple step. But the conversion from whitespace to a number is still costly (in js1k terms).
The core compiler started out as a nice switch. It actually started out as a very long switch, with one
case
for every op. But I generalized a bit and ended up with this. You basically have three different cases: defining a label, ops with an argument (literal) and ops without. This translates into:
switch (id) {
case MARK:
var id = getNumber();
labels[id] = program.length;
break;
case PUSH:
case COPY:
case SLICE:
case CALL:
case JUMP:
case JZERO:
case JNEG:
program.push(id, getNumber());
break;
default:
program.push(id);
// if not one of these...
// DUP, SWAP, POP, RET, EOF, ADD, SUB, MUL, DIV, MOD, PUT, GET, PUTC, PUTN, GETC, GETN
// throw Error('Unable to proceed, how in the world did you manage this');
}
As you can see, I dropped syntax error detection. You could add it but at an obvious cost.
So the next step is to take a look at this structure and seriously wonder whether it might not be a little too explicit. The answer is of course, "yes! by George it's huge". We defined those constants as numbers, since that's the shortest anyways. Let's use that fact. So we have three different cases so let's just replace the
switch
with an
if
.
if (id==MARK) labels[getNumber()] = program.length;
else if (id in {0:0,4:4,5:5,7:7,8:8,9:9,10:1}) program.push(id, getNumber());
else program.push(id);
Much better. I use the object literal trick for a quick reduction of
x==5||x==6||x==8||...
. Whether you can use that trick obviously depends on your environment but in this case we are only using numbers and in any case, we know for certain that our Object will not have been messed with. This is fine and it works too.
Can we go shorter? Yes! There are two things we can still do, both are fairly simple. The first is "of course" rewriting the statements to expressions.
if
-
else
can be rewritten with
&&
or
?:
. As a bonus you can usually embed/combine the expression with other code to save another semi-colon somewhere.
id == MARK ? labels[getNumber()] = program.length :
id in {0:0,4:4,5:5,7:7,8:8,9:9,10:1} ? program.push(id, getNumber()) :
program.push(id);
The other thing we can do is make use of the numbers-as-ids. Rather than the arbitrary order in which I defined the ops right now, I could also define them such that the specific ops I check for are all consecutive. That way I can reduce my check to simply checking whether the value of the id is below the first "other" op id.
id == 7 ? labels[getNumber()] = program.length : id < 7 ? program.push(id, getNumber()) : program.push(id);
Just look at that. The win is huge!
Looking back at this, I could have also done this minor tweak:
id == 7 ? labels[getNumber()] = program.length : (program.push(id, getNumber()), id < 7 ? program.pop():0);
In the end that would've been ok because getNumber()
is not actually executing any code. the pop()
is much shorter than the push(id)
. It's just slightly less efficient. But nothing you'd notice in this context.Note that it is calling
getNumber
twice. These are actually the only two calls to
getNumber
in the entire program. Therefor I will change the code a bit so that I can inline that function to get rid of the
function(){}
overhead. Let's turn to the number parsing function first...
Raw, numbers in the Whitespace language consist of spaces and tabs and are delimited by a newline. They can start at arbitrary points, so you can't really go ahead and massively parse them out ahead of time, unfortunately. There are seven ops that actually need to parse a number but due to the optimizations above we only actually need to parse them twice. And as I said, we'll make it so that we only need to parse them once. But first the function itself. My starting point was this:
// numbers are spaces and tabs, delimited with newlines
function getNumber(){
// first char always signifies flag
var flag = input[pc++];
// we know the input will be spaces and tabs, delimited by a newline
// we will change the spaces and tabs into zeroes and ones.
// then we'll apply parseInt, which will delimit itself at the newline
// we can't do it all at once because we also need to move the PC :/
var p = /[10]+/.exec(input.slice(pc).replace(/ /g,'0').replace(/\t/g,'1'))[0];
var n = parseInt(p,2);
pc += p.length+1; // +1 for delimiter
// dont forget to apply the flag
return flag == ' ' ? n : -n;
}
This has already been trimmed of a bit of bloat. But we can do much better :) I want to trim the size of applying the flag. First of all, I'd like to immediately apply this flag in the
parseInt
line. Some minor tweaks to the
pc
refs are enough here.
function getNumber(){
var p = /[10]+/.exec(input.slice(++pc).replace(/ /g,'0').replace(/\t/g,'1'))[0];
var n = (input[pc-1]==' '?1:-1) * parseInt(p,2);
pc += p.length+1; // +1 for delimiter
return n;
}
Well, that return seems like a waste now. Maybe we should do apply the length computation before getting the new value? That way we can combine that expression into the return statement. Maybe we can even combine both expressions! That would come at the cost of a double
p.length
though, but you know what? That's not a problem! Partially because
p.length
will become
p[X]
after minification anyways! So let's merge all those lines into one return statement. Note that we still have to cache the parsed string because we need to refer to that twice (converting and moving the PC).
function getNumber(){
var p = /\S+/.exec(input.slice(++pc).replace(/ /g,0).replace(/\t/g,1))[0];
return (input[(pc+=p.length+1)-(p.length+2)]==' '?1:-1) * parseInt(p,2);
}
Beautiful! The last step here is to inline the function. Since it basically consists of two expressions (ignore the return keyword), we can combine into a single expression and maybe merge them with the existing code as well!
// parse a number
id<8 && (p = /[10]+/.exec(input.slice(++pc).replace(/ /g,'0').replace(/\t/g,'1'))[0], p = (input[(pc += p.length+1)-(p.length+2)]==' '?1:-1) * parseInt(p,2));
// compilation
id== 7 ? labels[p] = program.length : id<7 ? program.push(id, p) : program.push(id);
As you can see I've merged the two expressions into a single one. It will use the same variable for the result. This is fine because we don't care about the parsed string part after parsing, only while parsing. So after that expression finishes,
p
will hold the parsed number value. We just have to hide it behind a condition because we can't always move the PC forward. Luckily that condition is really short now so that comes in handy again. But if we are going to use the condition twice, can't we really mix and merge them? Yes, we can!
// only parse a number for certain ops
id<8 ?
(
// first actually parse a substr
p = /[10]+/.exec(input.slice(++pc).replace(/ /g,'0').replace(/\t/g,'1'))[0],
// get the value and move the PC
p = (input[(pc += p.length+1)-(p.length+2)]==' '?1:-1) * parseInt(p,2),
// register a label or push ops+parameter on the stack
id== 7 ? labels[p] = program.length : program.push(id, p)
)
// op is not followed by number
: program.push(id);
Look at that... we've merged the huge switch and the rather big function into a single expression. The double length still bothers me so let's move it away. I see there's still some space near the end... ;)
// get the value
q = (input[pc-1]==' '?1:-1) * parseInt(p, 2),
// move the PC
pc += p.length + 1,
I moved the changing of the PC out. I used a fresh variable for assigning the actual number value, because I now need the
p
for determining the length of the parsed string. No biggie, plenty of (short!) variables left.
Let's take a look at the core of the interpreter:
if (!next) stack.push(program[pc++]);
if (next==1) stack.push(stack[program[pc++]]);
if (next==2) stack.splice(a = program[pc++], stack.length-(a+1));
if (next==3) callstack.push(pc), pc = labels[program[pc++]];
if (next==4) pc = labels[program[pc++]];
if (next==5) stack.pop() ? ++pc : pc = labels[program[pc]];
if (next==6) stack.pop()<0 ? pc = labels[program[pc]] : ++pc;
// 7 = MARK, compile time only
if (next==8) stack.push(stack[stack.length-1]);
if (next==9) stack.push(stack.pop(a = stack.pop()),a);
if (next==10) stack.pop();
if (next==11) pc = callstack.pop();
if (next==12) pc = input.length;
if (next==13) stack.push(stack.pop() + stack.pop());
if (next==14) stack.push(stack.pop() - stack.pop());
if (next==15) stack.push(stack.pop() * stack.pop());
if (next==16) stack.push(stack.pop() / stack.pop());
if (next==17) stack.push(stack.pop() % stack.pop());
if (next==18) heap[stack.pop(a = stack.pop())] = a;
if (next==19) stack.push(heap[stack.pop()]);
if (next==20) bufOut += String.fromCharCode(stack.pop());
if (next==21) bufOut += stack.pop();
if (next==22) {
stack.push(bufIn.charCodeAt(0));
bufIn = bufIn.slice(1); // can do? move as param to charCodeAt ?
}
if (next==23) {
stack.push(a = parseInt(bufIn));
bufIn = bufIn.slice(String(a).length);
}
Or what's left of it anyways :) Besides rewriting the
if
's to
&&
or even
?:
expressions, there are actually still two things we can golf down. For one, we know that
next
is going to contain one of the op constants, which is a number from
0
to
23
. Anything else is an error, but we don't bother with those yet. So look closer at the repetitive
next==4
checks. We can't turn it back into a switch because that's much bigger, but we can cut down at least one byte per
if
under the assumption that the next check always checks for a bigger op constant than the previous
if
. And it does! So...
if (!next) stack.push(program[pc++]);
if (next<2) stack.push(stack[program[pc++]]);
if (next<3) stack.splice(a = program[pc++], stack.length-(a+1));
if (next<4) callstack.push(pc), pc = labels[program[pc++]];
if (next<5) pc = labels[program[pc++]];
if (next<6) stack.pop() ? ++pc : pc = labels[program[pc]];
if (next<7) stack.pop()<0 ? pc = labels[program[pc]] : ++pc;
// 7 = MARK, compile time only
if (next<9) stack.push(stack[stack.length-1]);
if (next<10) stack.push(stack.pop(a = stack.pop()),a);
if (next<11) stack.pop();
if (next<12) pc = callstack.pop();
if (next<13) pc = input.length;
if (next<14) stack.push(stack.pop() + stack.pop());
if (next<15) stack.push(stack.pop() - stack.pop());
if (next<16) stack.push(stack.pop() * stack.pop());
if (next<17) stack.push(stack.pop() / stack.pop());
if (next<18) stack.push(stack.pop() % stack.pop());
if (next<19) heap[stack.pop(a = stack.pop())] = a;
if (next<20) stack.push(heap[stack.pop()]);
if (next<21) bufOut += String.fromCharCode(stack.pop());
if (next<22) bufOut += stack.pop();
if (next<23) {
stack.push(bufIn.charCodeAt(0));
bufIn = bufIn.slice(1); // can do? move as param to charCodeAt ?
}
if (next<24) {
stack.push(a = parseInt(bufIn));
bufIn = bufIn.slice(String(a).length);
}
However this can only be used in conjunction with
if
-
else if
structures. But we will get to that once we rewrite the entire thing to an expression. Just looking at the above should make the reptitive nature of five ops quite obvious. So let's just combine them, yeah?
if (next>12&&next<18)b=stack.pop(a=stack.pop()),stack.push(next==13?a+b:next==14?a-b:next==15?a*b:next==16?a/b:a%b);
We can probably even do better than this (using eval, for example), but let's save that for later.
Let's convert everything to an expression now (I've reverted the comparisons for now):
!next && stack.push(program[pc++]);
next==1 && stack.push(stack[program[pc++]]);
next==2 && stack.splice(a = program[pc++], stack.length-(a+1));
next==3 && (callstack.push(pc), pc = labels[program[pc++]]);
next==4 && (pc = labels[program[pc++]]);
next==5 && (stack.pop() ? ++pc : pc = labels[program[pc]]);
next==6 && (stack.pop()<0 ? pc = labels[program[pc]] : ++pc);
// 7 = MARK, compile time only
next==8 && stack.push(stack[stack.length-1]);
next==9 && stack.push(stack.pop(a = stack.pop()),a);
next==10 && stack.pop();
next==11 && (pc = callstack.pop());
next==12 && (pc = input.length);
(next>12&next<18) && (b=stack.pop(a=stack.pop()),stack.push(next==13?a+b:next==14?a-b:next==15?a*b:next==16?a/b:a%b));
next==18 && (heap[stack.pop(a = stack.pop())] = a);
next==19 && stack.push(heap[stack.pop()]);
next==20 && (bufOut += String.fromCharCode(stack.pop()));
next==21 && (bufOut += stack.pop());
next==22 && (
stack.push(bufIn.charCodeAt(0)),
bufIn = bufIn.slice(1) // can do? move as param to charCodeAt ?
);
next==23 && (
stack.push(a = parseInt(bufIn)),
bufIn = bufIn.slice(String(a).length)
);
Let's change this into an
if
-
else
stream with
?:
.
!next ? stack.push(program[pc++]) :
next==1 ? stack.push(stack[program[pc++]]) :
next==2 ? stack.splice(a = program[pc++], stack.length-(a+1)) :
next==3 ? (callstack.push(pc), pc = labels[program[pc++]]) :
next==4 ? (pc = labels[program[pc++]]) :
next==5 ? (stack.pop() ? ++pc : pc = labels[program[pc]]) :
next==6 ? (stack.pop()<0 ? pc = labels[program[pc]] : ++pc) :
// 7 = MARK, compile time only
next==8 ? stack.push(stack[stack.length-1]) :
next==9 ? stack.push(stack.pop(a = stack.pop()),a) :
next==10 ? stack.pop() :
next==11 ? (pc = callstack.pop()) :
next==12 ? (pc = input.length) :
(next>12&&next<18) ? (b=stack.pop(a=stack.pop()),stack.push(next==13?a+b:next==14?a-b:next==15?a*b:next==16?a/b:a%b)) :
next==18 ? (heap[stack.pop(a = stack.pop())] = a) :
next==19 ? stack.push(heap[stack.pop()]) :
next==20 ? (bufOut += String.fromCharCode(stack.pop())) :
next==21 ? (bufOut += stack.pop()) :
next==22 ? (
stack.push(bufIn.charCodeAt(0)),
bufIn = bufIn.slice(1) // can do? move as param to charCodeAt ?
) :
(
stack.push(a = parseInt(bufIn)),
bufIn = bufIn.slice(String(a).length)
);
Applying the lesser then trick:
!next ? stack.push(program[pc++]) :
next<2 ? stack.push(stack[program[pc++]]) :
next<3 ? stack.splice(a = program[pc++], stack.length-(a+1)) :
next<4 ? (callstack.push(pc), pc = labels[program[pc++]]) :
next<5 ? (pc = labels[program[pc++]]) :
next<6 ? (stack.pop() ? ++pc : pc = labels[program[pc]]) :
next<7 ? (stack.pop()<0 ? pc = labels[program[pc]] : ++pc) :
// 7 = MARK, compile time only
next<9 ? stack.push(stack[stack.length-1]) :
next<10 ? stack.push(stack.pop(a = stack.pop()),a) :
next<11 ? stack.pop() :
next<12 ? (pc = callstack.pop()) :
next<13 ? (pc = input.length) :
// 13,14,15,16,17:
next<18 ? (b=stack.pop(a=stack.pop()),stack.push(next==13?a+b:next==14?a-b:next==15?a*b:next==16?a/b:a%b)) :
next<19 ? (heap[stack.pop(a = stack.pop())] = a) :
next<20 ? stack.push(heap[stack.pop()]) :
next<21 ? (bufOut += s.fromCharCode(stack.pop())) :
next<22 ? (bufOut += stack.pop()) :
next<23 ? (
stack.push(bufIn.charCodeAt(0)),
bufIn = bufIn.slice(1) // can do? move as param to charCodeAt ?
) :
(
stack.push(a = n(bufIn,10)),
bufIn = bufIn.slice(s(a).length)
);
Whooo, much better :) Minified:
while(f<e[y])q=e[f++],q?q<2?l[C](l[e[f++]]):q<3?l.splice(r=e[f++],l[y]-(r+1)):q<4?(m[C](f),f=g[e[f++]]):q<5?f=g[e[f++]]:q<6?l[E]()?++f:f=g[e[f]]:q<7?l[E]()<0?f=g[e[f]]:++f:q<9?l[C](l[l[y]-1]):q<10?l[C](l[E](r=l[E]()),r):q<11?l[E]():q<12?f=m[E]():q<13?f=v[y]:q<18?(s=l[E](r=l[E]()),l[C](q==13?r+s:q==14?r-s:q==15?r*s:q==16?r/s:r%s)):q<19?n[l[E](r=l[E]())]=r:q<20?l[C](n[l[E]()]):q<21?p+=t.fromCharCode(l[E]()):q<22?p+=l[E]():q<23?(l[C](o.charCodeAt(0)),o=o.slice(1)):(l[C](r=u(o,10)),o=o.slice(t(r)[y])):l[C](e[f++])
But we can do even better. Instead of just caching the push and pops, we could
bind
them and reduce the left part of that expression from four to just one character. It occurs a few times, so the overhead of the binding is less than the saved space!
k=[].pop.bind(stack = [])
l=[].push.bind(stack)
...
!next ? l(program[pc++]) :
next<2 ? l(stack[program[pc++]]) :
next<3 ? stack.splice(a = program[pc++], stack.length-(a+1)) :
next<4 ? (callstack.push(pc), pc = labels[program[pc++]]) :
next<5 ? (pc = labels[program[pc++]]) :
next<6 ? (k() ? ++pc : pc = labels[program[pc]]) :
next<7 ? (k()<0 ? pc = labels[program[pc]] : ++pc) :
...
As a bonus, we can save another char by assigning the stack variable inside the
bind
call. There are 12 places where
stack.push
was used. Minified that would've been
a[ b]
so four characters. Add the overhead for caching
push
and we're to a
(12*4)+8=56
for the old method (at best) and
12+17=29
for the new method. Almost twice because we also do it for
pop
, which only has 8 occurrences.
Turned out later that while it might be smaller, it didn't work well with "eval packers". So I ended up not using this trick.At the start of the interpreter the PC is reset to
0
. The important snippet here is:
callstack = [];
heap = [];
pc = 0;
while (pc<program.length) {
next = program[pc++];
Now, I've been looking at that for a while and I figured that we could initialize
pc
to an empty array. The first time it enters the while loop, the array will be coerced to a number, resulting in
0
. That's what we want anyways so that's ok. The next line applies the
++
post-fix (!) incremental operator on
pc
. So we have two more things to check. First of all, the post-fix operator will cause the array to be coerced to a number before it is returned. So we get
0
again, which is again what we expected. The incrementor then changes the value of
pc
to an actual number, so we don't have to check anything else after that. That means we can initialize the pc to an (arbitrary) array.
We can also move the initialization step to the callstack. This is actually fine because no program should
RET
without having done a
CALL
. It does mean that the
callstack
will always have one value lingering, but we don't care about that sort of thing in js1k :D So let's merge it to heck!
callstack = [pc = heap = []];
Note that the heap is not affected by this. The coercion and incremental operator do not mess with the actual value of the
pc
, they only replace it. So now, the interpreter intialization part looks like this:
k=[].pop.bind(stack = []);
l=[].push.bind(stack);
bufIn = bufOut = '';
callstack = [pc = heap = []];
Look at those
bind
lines, they use an empty array as a shortcut to get to the
push
and
pop
function objects. Those arrays don't matter at all, so let's use that space!
k=[bufIn = bufOut = ''].pop.bind(stack = []);
l=[callstack = [pc = heap = []]].push.bind(stack);
In the interpreter, the
GETC
op currently uses two expressions.
next<23 ? (
l(bufIn.charCodeAt(0)),
bufIn = bufIn.slice(1) // can do? move as param to charCodeAt ?
) :
You might have spotted the comment before. And the answer is "yes, we can!". charCodeAt
expects one parameter. That param is
converted to a number. If the number value is actually
NaN
, it will return
+0
instead. Perfect! We pass on whatever string is left in bufIn, which is either going to be
NaN
or the empty string which is
0
.
There's one issue that had me scared for a second; order of changing variables. If we move the
bufIn = bufIn.slice(1)
part to the
charCodeAt
call, the
bufIn
will be changed before
charCodeAt
is evaluated. Won't it apply it's operation on the _new_ value of
bufIn
? No! Because it is a string, so a primitive value, it is immediately passed on. So what happens is that in the whole expression, we call
l
(which is our bound
push
) with one parameter; the
charCodeAt
result. JS will first get the primitive value of
bufIn
before starting to apply the
charCodeAt
method to that value. But before that actually happens, the value held by the
bufIn
is changed (to remove the first char). This change is not reflected by the call to
charCodeAt
that's still going on though. So it will still simply return the first byte. Oof! We basically do the same with the expressions that follow, for
GETN
(move the assignment to be the third parameter of
parseInt
) so the result is:
next<23 ? l(bufIn.charCodeAt(bufIn = bufIn.slice(1))) :
l(m = n(bufIn,10, bufIn = bufIn.slice(s(m).length))) ;
Let's move up a little. We will find these lines:
// create mapping for op->id
map = {};
// labels are the only pesky reason for compilation :/
labels = {};
Normally, you'd never think to share these objects. But in this case, we know exactly what kind of things are going to be mapped and what kind of labels we're gonna use. The ops are always whitespace strings and the labels are always numbers. It's perfectly safe to merge these into one single object! As a bonus, we can just use the same variable for both so we can drop one of the initializers and replace the other var completely. Nice!
...
After this I kind of stopped writing this blog post and did some more puzzling. By now I've completed
my entry, let me quickly walk you through some interesting revelations I went through before finally achieving the desired compression :)
I figured out an awesome way of compressing the interpreter part of the code even better. We basically have a list of expressions. Based on a ranged number we execute a certain expression. So I converted that to an object literal that I would reference with the next op to execute. The value of that key in the object literal would be a string that I would run through
eval
. Note that
eval
still has access to local variables / closures (and the fact that no
var
keyword was used anywhere means all variables are invariantly global anyways...), so there was no problem with that.
After having moved the code to an object literal, the next quite obvious step was to change that to an array literal that was directly accessed and evalled. It's beautiful if it weren't such a huge hack :)
T='n=k(m=k()),'
while (h<i[C])
eval([
'l(i[h++])',
'l(o[i[h++]])',
'o.splice(m=i[h++],o[C]-(m+1))',
'r.push(h),h=a[i[h++]]',
'h=a[i[h++]]',
'k()?++h:h=a[i[h]]',
'k()<0?h=a[i[h]]:++h',
,
'l(o[o[C]-1])',
'l(k(m=k()),m)',
'k()',
'h=r.pop()',
'h=d[C]',
T+'l(m+n)',
T+'l(m-n)',
T+'l(m*n)',
T+'l(m/n)',
T+'l(m%n)',
't[k(m=k())]=m',
'l(t[k()])',
'g+=s.fromCharCode(k())',
'g+=k()',
'l(c.charCodeAt(c=c.slice(1)))',
'l(m=y(c,10,c=c.slice(s(m)[C])))'
][i[h++]]);
Just look at it! Every entry in the array corresponds with the arbitrary ID of an op of the language. (Mark is a compile-time only op but we still need to fill up that space, thanks ES5!)
This worked fine. I added a few more cachings of substrings too. But in the end I was unable to achieve the desired compression.
The entire excersise was not in vain though. As you can see in the final result I still use a similar structure, except I use functions instead of
eval
.
while (h<i.length)
[
function(s){o.push(i[h++])}, // push
function(s){o.push(o[o.length-i[h++]]|0)}, // copy
function(s){
// either is fine...
// m=((i[h++]+1)<o.length?i[h-1]+1:o.length);
o.splice(-Math.min(i[h++]+1,o.length), Math.min(i[h++]+1,o.length)-1)
}, // slice
function(s){r.push(h),h=a[i[h]]}, // call
function(s){h=a[i[h]]}, // jump
function(s){o.pop()?++h:h=a[i[h]]}, // jzero
function(s){o.pop()<0?h=a[i[h]]:++h}, // neg
, // mark
function(s){o.push(o[o.length-1]|0)}, // dup
function(s){o.push(o.pop()|0,o.pop()|0)}, // swap
function(s){o.pop()}, // pop
function(s){h=r.pop()+1}, // ret
function(s){(h=d.length)}, // eof
function(s){n=o.pop(m=o.pop()|0)|0,o.push(m+n)}, // add
function(s){n=o.pop(m=o.pop()|0)|0,o.push(m-n)}, // sub
function(s){n=o.pop(m=o.pop()|0)|0,o.push(m*n)}, // mul
function(s){n=o.pop(m=o.pop()|0)|0,o.push(Math.round(m/n))}, // div
function(s){n=o.pop(m=o.pop()|0)|0,o.push(m%n)}, // mod
function(s){t[o.pop(m=o.pop())|0]=m|0}, // put
function(s){o.push(t[o.pop()]|0)}, // get
function(s){b.children[2].innerHTML+=String.fromCharCode(o.pop()|0)}, // putc
function(s){b.children[2].innerHTML+=o.pop()|0}, // putn
function(s){t[o.pop()|0]=c.charCodeAt(0),c=c.slice(1)}, // getc
function(s){t[o.pop()|0]=(m=parseInt(c,10)),c=c.slice(c.indexOf(m)+String(m).length)} // getn
]
[i[h++]]()
Before you g.. "But that thing is so huge, it will never fit!", ugh. Yes, that's the golfer in you speaking. But we're moving on to "
eval packers" now. That means we're doing arbitrary string compression. In this game, repetition is key. If you look closer you'll see that all functions have a parameter. But none actually use it. In fact, all the way down, you see that none is ever passed on. The reason for this parameter is that the signature of the function now looks the same as the other functions used in my entry. There was in fact only one function that used a parameter. But because all the functions look like this, the whole string
function(s){
can now safely be replaced by one character (in the eval string, of course). It's the same reason why the arithmetic expressions are also very reptitive. They're all replaced and you won't feel a thing.
Notice that I also have an alternative to using
Math.min
in the
slice
. I kept it there in hopes that compression might be more friendly to that version. Remember, we're talking in single bytes here; Las Vegas style. Gains are very irregular so having multiple ways of doing the same thing is invaluable.
I actually thought about replacing all variables with a variables that only use the same character.
x
,
xx
, etc... I haven't done so, but I imagine it might actually work for the eval packer...
Let's see, what else. Oh yeah. So Whitespace only uses spaces, tabs and newlines. And while these are all one character in binary data, in js source strings they are not. Most notably, there's no way to put a newline in a js source string in one character (you have to use
\n
). Meh! At some point I decided for a couple of reasons that I would use hearts as the source, rather than actual whitespace. But of course the whitespace still needed to be accepted and properly handled (as this _is_ a Whitespace engine). The hearts is nothing more than switching out character values. Whether you have a language of spaces, tabs and newlines, or you have a language of spaces, open hearts and closed hearts; at the core you have three characters and together they signify something.
So I wanted this in. Of course you could specify these in unicode (
\uxxxx
) but that's rather expensive. Note that Closure will always convert them to this notation. Very annoying :( But you can also inline them as unicode values. JS source is UTF16 so that's actually fine. However, these unicode hearts are actually three bytes. Meh! And while I've done my utter best to keep this entry as pure as possible, I had to concede here. Inlining a demo with hearts was just killing me. Mind you, I wanted to add the counting example from the website:
♥♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥♥ ♥ ♥♥ ♥ ♥♥♥♥♥
(Note that the ♥ would be a newline in Whitespace)
But as you can see, that's 22 hearts right there. That amounts to 66 bytes! A total waste of 44 bytes. The same basically goes for the string of ops (a | delimited string of ops, that doubly serves as a regular expression). That also contains 21 hearts. Total waste of 86 bytes, yikes! Even the eval packers did not help me here (might be a good hint!). Too many hearts were lingering. So in the end I resorted to a very tiny modification:
f=function(s){ v = s.replace(/\n/g,'♥').replace(/ /g,'♡'); }
// became
f=function(s){ v = s.replace(/\n|/g,'♥').replace(/ /g,'♡'); }
This is the function that takes input and converts newlines and tabs to hearts. It's output is then put in the textarea so you actually see them too. In the source, I changed all the black hearts to character 1 (
), which probably doesn't even show up in html. The function runs over both the example and the ops string and converts the ones back to black hearts and the tabs to white hearts. This actually pushed me over the 1024 byte border eventually.
Another simple trick I discovered this time is shown in the function above. Note that it doesn't return a value. Instead it stores it in a global variable. This actually saves a few bytes when you don't use the function too often. The return takes 6 or 7 bytes, depending on what you return (you can leave out the space for stuff like array and object literals). The assignment of the result of the function takes one extra byte (one way or the other). The assignment to the global variable takes you two bytes. So you can do this 4 or 5 times before it starts becoming more expensive. It's a minor trick, but hey :)
Of course I started out with caching everything. Any property that could be minified was going to be minified. A fun fact for myself is that my
Zeon.js was actually pretty damn efficient here. Compared to
UglifyJS and
Closure Compiler. Those are my tools of the js1k trade :) Zeon just strips whitespace, renames variables and caches property names (all of these are optional). The property name caching was particularly nice. None of the other engines seemed to do this. But whatever, it's a little thing. The gains by closure or uglify after zeon were very small. So that meant that I did a decent job at hand tuning and zeon did a decent job at removing whitespace and renaming variables.
In the end I made
14 working work in progress copies (that's my VC, deal with it :). In case you want to see them, I put the whole project in
a github repo. If I feel like I have too much time on my hands, I'll create it into a proper version. "For what it's worth". Note that this might never happen.
Last I checked, all the ops were working properly. Check
v8 for a set of tests. You can use this version to test and see debugging output in your console. Stuff like stack and what not. I needed this because some of the examples were simply not working. I fixed them this way, obviously should've done that from the start but hey.
Had a lot of fun
this entry. Thanks everyone for making this js1k rock once again. We're almost at a 100 submissions. Super awesome :D