Let me start by stating the obvious; asm.js is NOT designed to be written manually. It was designed to be a "compilation target", translating from one language to another. The syntax of asmjs is a subset from ES5 and contains a lot of mandatory overhead for typing. Additionally it is very restrictive in typing and memory, you'll have to do many memory related things yourself. The tradeof is that this allows browsers to infer types much better and compile the code in a highly efficient manner. Additionally, to some degree, you control the GC.
Do not proceed if you're learning to code, JS or otherwise. You have been warned. There are only a handful of real world applications where you'd want to write asmjs and "if you have to ask", yours probably isn't one of them.
Okay let's get going. I'm going to try and explain the
latest asm.js spec (from 2014, at the time of writing) in more human readable form. I hope :)
Module wrapper
All
asm.js
code is wrapped in a "CommonJS" kind of way, pre-dating ES6, where you have a function that acts as your scope. Everything happens inside the scope. This function gets a global object (your regular
window
, for example) with all its built-ins and you can access parts of of them and when you do,
asm.js
will know their types and compile them. We'll get back to this.
// from the asmjs.org website
function MyAsmModule() {
"use asm";
//...
return {
// functions to expose
};
}
Nothing spectacular except that the "declaration prologue" for
"use asm";
will ask the browser to interpret the entire body of this function as an "asm.js module". You don't need to repeat the prologue for every function inside the module, that is inferred. The good part, by design, is that the code will run with regular JS semantics if the browser does not support asmjs. In some cases it may just not be as efficient. And even if asmjs is supported but the compilation failed, the module should still be available to you albeit probably a bit slower.
The asmjs module is defined as accepting up to three parameters:
stdlib
,
foreign
, and
heap
. You can use different names or omit params right-to-left, if you want. Inside the module we can interpret the heap as being of various types but at its core it's just an
ArrayBuffer
(a fixed length unsigned byte array). The length has to be a power of
2
, so
2^5
or
2^200
but not
300
. So yes, you are responsible for any memory management inside asmjs modules. Creepy.
function MyAsmModule(stdLib, foreign, heap) {
"use asm";
//...
return {
// functions to expose
};
}
Keep in mind you can't use
eval
as a var name anywhere. Also, any ASI rules from regular JS apply.
The browser does not call this module to initialize it, you do this yourself like
let mymod = MyAsmModule(window, funcs, new ArrayBuffer(1024));
.
Heap
Your
asm.js
code works on one "heap". Meaning one large fixed consecutive area of memory for you to do with as you please. The memory is essentially an
ArrayBuffer
on which you can put certain "viewports" (
TypedArray
) which interpret the memory in certain formats like
int
or
float
.
The heap is like a scratch pad insofar that the buffer is fixed between calls. You can write input data into it and read output from it by maintaining a reference to the buffer externally.
The size is fixed and while there
was a proposal to allow growing the buffer, this proposal was knifed because heuristics in v8 caused problems. Thanks. The only way to grow now is to create a new buffer, copy the old buffer into it (using
.set()
) and use a new module call with that buffer. Note that wasm does allow growing memory.
Types
Now for annotating. Pretty much everything must be annotated and there are a few types we can take into account for this:
-
void
(undefined)
-
double
(regular js numbers)
-
signed
(32bit integers where the most significant bit is a sign flag)
-
unsigned
(32bit integers without sign flag)
-
int
(either
signed
or
unsigned
but unspecified otherwise)
-
fixnum
(any
int
that would have the same value as
signed
and
unsigned
, so any positive literal that uses the most significant bit)
-
intish
(the result of an
int
operation and must be explicitly casted back to
signed
or
unsigned
)
-
double?
(for operations that produce either a
double
or
undefined
, these must be casted back to a number)
-
float
(32bit floating number)
-
float?
(for operations that produce either a
float
or
undefined
, these must be casted back to a number)
-
floatish
(the result of a
float
operation and must be explicitly casted back to
float
with
Math.fround
)
-
extern
(any value exposed outside of asmjs scope)
From these, you'll only actively use
signed
,
unsigned
, and
double
. And a lot of that. There are no strings, no arrays, no objects, no higher level types. You're on your own here.
You'll need to explicitly annotate any function arguments, return types, and any use of variables. Annotations are in the form of (otherwise) excessive source code from which the environment can 100% determine one of the above types. In particular:
-
signed
is forced by
foo | 0
-
unsigned
is forced by
foo >>> 0
-
double
is forced by
+foo
Number literals
Number literals become a
double
by having a dot in them (
2.0
),
fixnum
if they are below
1<<31
, and
unsigned
if they are between
1<<31
and
1<<32
because there's no other way to interpret positive 32bit number with that most significant bit set. You explicitly get a
float
by doing
fround(2.0)
where
fround
must be the built-in
Math.fround
function. Any number literal that is beyond 32bit (
signed
or
unsigned
) is deemed syntactically invalid.
You can negate a number literal by prefixing
-
to it. Note that you still have to do explicit casting with
+
or
|0
in that case. If the value has no dot (
.
) and the result would be 32bit. Otherwise a
double
or invalid.
Variables
Module globals, vars declared inside the main scope of the module, can only have a fixed number of initializers;
- as an int (signed?)
var x = 5;
- as a double
var x = 5.0;
- as a float
var x = fround(5.0);
, in this case the name must be
"fround"
and the number must contain a dot
- "library import"
var x = stdlib.foo
, note that
stdlib
must be the same name of the first param of the module and
foo
can only be
one of a handful of names- math import, similar to library import but with
Math
added
var x = stdlib.Math.floor;
- external imports
--
var x = foreign.foo
imports a value as read-only function
--
var x = foreign.foo|0
imports a value as read-write int
--
var x = +foreign.foo
imports a value as a read-write double
- heap vieports
var x = new stdlib.Uint16Array(heap)
, where
stdlib
and
heap
are matching module param names and the result is a certain known
ArrayBufferView
view on the module buffer
Variables in a function can only be defined in three different ways;
- as an int (signed?)
var x = 5;
- as a double
var x = 5.0;
- as a float
var x = fround(5.0);
, in this case the name must be
"fround"
and the number must contain a dot
You can't initialize a var with another var or other kinds of expression. I think this restriction is a little silly but I suppose it reduces the number of combinatory edge cases.
Functions
Functions are the only place for logic. They can have arguments, which must be each casted in an explicit way on the first lines of the function. They can have a return value, which may not be implied (unlike JS).
The spec doesn't mention var statements
as a valid statement type but they are
defined above.
All var statements in a function must be grouped at the top of a function. This fact is defined a little obscure in the spec. You can see it in
the function type annotations. You can use multiple statements as long as they are grouped. Note that you'll have to initialize the vars with a literal for typing (see above). You can assign the result of an expression to them after that.
Arguments
Function arguments must be typed explicitly on the first lines of a function.
//... asmjs header
function foo(a, b) {
a = a | 0;
b = fround(b);
// ...
}
Note that the casting of the parameters is actually part of the syntax for declaring an
asm.js
function. It can't happen elsewhere in the function and is not otherwise inferred even if it could be.
Returns
Actually, the last example is incorrect because the return type must also be declared. It can have one of five types:
-
return +x;
(returns a
double
)
-
return x|0;
(returns a
signed
)
-
return 5;
and
return -5;
-
return func(x);
(only when
func
is one of a set of standard functions, other functions must be casted)
-
return;
(
void
, function result can not be assigned)
Note that if you return a variable or defined (non-standard) function the type is not inferred from that value, you must always explicitly cast it.
Logic
In an
asm.js
module, regular JS logic like
if
and blocks can only appear inside functions and are actually illegal in the module global space. From what I can see, most of the regular ES3 syntax is valid with a touch of ES5. This includes
if
,
if-else
, blocks,
return
,
while
,
do
,
for(;;)
(but NOT
for-in
!),
break
,
continue
, labels, and
switch
with
case
and an optional
switch
. Expression statements, basically anything that yields a value, are also allowed.
Switches
Cases in a
switch
can only have signed integer literals as values. The condition must also yield a signed int.
They ought be compiled as a jump table so keep that in mind. If you use numbers that are too big you'll get an error about this. This actually confused me at first because Firefox, at the time, threw "asm.js type error: all switch statements generate tables; this table would be too big". At first I thought that meant there were too many cases. A quick twitter convo revealed that the real reason was caused by using
case
values that were simply way too big and I interpreted the error message in a wrong way.
I'm not sure if
default
can appear anywhere but last case in a
switch
like it can in JS but who ever does that, anyways. There's a good chance you didn't even know about this edge case before reading it now ;) The asm.js spec is not ruling this out either way, and since JS explicitly allows it but jump tables don't work that way... well, you're on your own here.
Expressions
A few particulars;
- Since there are no objects as concept, the only property access you can do is "as an array" on the heap. Return type for this access depends on the type of heap view being accessed.
- You can't assign to variables not explicitly defined in the module or the current function
-
~~x
is explicitly understood to convert an arg to
signed
if it is a
double
or
float?
(but not double?
or float
?)-
+foo()
will be a
double
if the function returns a
double
- There is
an explicit list of input-output types for most binary operators. If the types listed are "super types" they can be substituted by a valid sub-type and yield the same (as listed) type as a result.
- addition and negation results in an
intish
type if the values are below
1<<20
and
double
otherwise (I don't think it errors out on the typing)
-
x ? y : z
is valid and works as in JS; returns the value from either operand, but unlike JS the operands must have the same type
- compound assignments don't work, that's stuff like
|=
^=
etc. You have to write them out explicitly.
Imports
In the header you can have "standard library imports". This means you're assigning a property from that global object you pass on to a local variable in the global module space, so;
function module(stdlib) {
var fround = stdlib.Math.fround;
}
The spec explicitly only allows
certain properties from the
global
or
global.Math
to be imported this way. (Spoiler;
Infinity
,
NaN
, and most of
Math
).
You can also import things from the
foreign
argument (the second argument of the module wrapper) in two forms.
- you can import it as is and in that case it becomes an immutable function
- you can import it casted
foreign.foo|0
(as
int
) or
+foreign.foo
(as
double
), and the var will be mutable.
function module(stdlib, foreign) {
var func = foreign.foo; // immutable function
var a = foreign.a | 0; // mutable int
var b = +foreign.b; // mutable double
}
Heap
You can declare various ways of accessing the
heap
, the third parameter of the module wrapper, by wrapping the heap in one of
the valid ArrayBufferView
types].
Jump tables
The spec allows to define a function jump table as a module global. This is kind of an array of functions. Every function should return the same type.
function mymod() {
function a(){ return 5; },
function b(){ return 20; },
function go(n) {
n = n | 0;
return jmp[ n & 1 ]() | 0;
}
// !Important! The jump table goes AFTER functions and BEFORE the export...
var jmp = [ a, b ];
return {go:go};
}
I don't think the spec is very clear on these tables so let me point out some caveats;
- the table can only be declared in the global module space
- the declaration goes
after the functions and
before the export
- accessing the function requires the
& 1
(or any int literal), is
|
not allowed
- the tables must be
a power of 2
, so
len=2^n
for some
n
Note that array literals don't exist as such. The type of
jmp
is actually an "immutable table".
I think the way to access them is a little confusing. The
& 0
way of accessing a function may not look like the same as
|0
but keep in mind that
&
is not a "logical and". You can do
x & 0xff
(where
0xff
is the length of the table) which is effectively the same as
|0
except it also makes sure the number can't spill over the array length. Since the table length is known at compile time and the right number must be a literal, the compiler can confirm and "proof" that the table access never exceeds the length.
jmp[index & 1](); // for a jump table with 2 functions
jmp[index & 1023](); // for a jump table with 1024 functions
jmp[index & 31](); // for a jump table with 32 functions
// etc...
I would probably expect it to be
jmp[ n >>> 0 & 0xff ]
to make sure the index can't end up negative. But perhaps I missed something that already ensures this.
Exports
An
asm.js
module exposes its functions by returning them on an object literal or by returning one function, just like most CommonJS modules.
function mymod() {
"use asm";
function a(){ return 20; }
return a;
}
function mymod() {
"use asm";
function a(){ return 20; }
function b(){ return 30; }
return {foo:a, bar:b};
}
Syntax
Those are the basics to reading and writing a valid
asm.js
program. Keep in mind that the actual syntax only accepts a subset of that of actual JS. In fact, JS means ES5 here as it predates ES6 so there is no
let
,
const
, etc. Everything is
var
and
function
.
Validation
There's a validator that's part of the asmjs repo: https://github.com/dherman/asm.js/blob/master/lib/asm.js
Firefox actually gives some sensible feedback when running asmjs code (line numbers are helpful), though some messages are a bit cryptic at times.
And here is a nice validation suite where you can paste your code and it'll tell you what's up: http://turtlescript.github.cscott.net/asmjs.html
Conclusion
The fact that the code works with and without explicit asm.js support is an important selling point. It means you can have asmjs conforming code that'll still work everywhere and as Chrome is demonstrating; mostly without significant slowdowns.
I think the asm.js syntax does force you to think a little closer about your code. That should never really hurt. Except perhaps your brain.