JSpath

2012-08-08

In DOM/CSS there's a mechanism for uniquely identifying dom elements from the root of the document. HTML effectively translates down to a tree of nodes. Each node contains either other nodes, plain text, or both. The system is called Xpath and it works very well. Especially for automated tools and therefore also for CSS selectors.

While working on Zeon I constructed the same kind of system for JS source. It occurred to me that every variable, function, and structure has a unique point in the source that could be uniquely referenced. With my previous blog post I remembered that I never really published this system, so let's correct that now.

I'm not saying this is going to be completely new. The concept certainly isn't, but I at least haven't heard of this system of identifying parts of JS source code with such reference points. So let's hope it's useful :)

JSpath will reference to parts of source as if they were an actual scope structure. We use the fact that this is statically determinable in JS. Scopes are fixed and determined by the source. We refer to names of variables, properties of variables, functions, objects, and arrays all relative to their scope hierarchy.

Each scope starts with a forward slash (/). Since scopes have a fixed order in source code, we can refer to them by index. So / is global scope, /1/3/ is the fourth scope (function or catch) of the second scope (must be a function) of the global scope.

Variables are referred to by their name and an optional index. If the index is left out, the first occurrence is implied. So /foo is the same as /foo[0]. For example; /1/foo[3] is the fourth occurrence of the variable foo in the second scope of global.

Functions, array literals, and object literals also have a unique position in the source. Note that while functions are automatically also scopes, scopes hierarchy is always addressed with numbers, not the function syntax. For functions you use parenthesis, for objects you use curly brackets, for arrays you use square brackets. Like variables, you refer to these literal structures by their occurrence in the source code. For nested objects and arrays this is the order of the occurrence of the opening bracket.

So /(2) is the third function in global, /{2} is the third object literal in global, /0/[1] is the second array literal in the first scope of global.

Properties are notated with dots, just like regular js notation. The same goes for properties of object literals. So /foo.bar matches foo.bar and /{0}.foo matches var x = {foo:5};.

Dynamic property access is a bit tricky, so we'll refer to them by a tilde. The tilde will "skip" one set of square brackets and whatever they wrap. So /foo~.bar matches foo['hello'+world].bar and /foo~~.bar would match foo['hello'+world][n].bar.

Just like object and array literals, strings also have unique positions. If for whatever reason you want to refer to them, use a quote with an index of occurrence. In the string case, we don't distinguish between quotes. This goes for the source (the order of strings is determined by any kind of quoted string in a scope) as well as the JSpath. /'1 would match "foo" in var x = ['oh', "foo"];. Note that we don't wrap the index like with functions etc.

Since the forward slash is taken by scopes, for regular expressions we'll use the caret (^) to refer to regexes. For example: /1/^12.lastIndex refers to the property of the regex in function f(){ return /foo/.lastIndex; }.

For numbers (any kind of number literal) we'll use the dollar sign ($). So /$0 is the first number in global scope, either hex or decimal.

Catch scopes are a bit tricky in that they will only record exactly one variable and there can not be a scope nested in a catch scope. So to make a catch variable stand out, we replace the forward slash with an exclamation mark. So /1!e is the e variable of the catch in function foo(){} try {} catch(e){}.

Another shortcut we'll make is replacing class.prototype.method with class#method. This has become an accepted way of writing inherited instance methods so I think it makes sense to write them this way. Note that you won't see that very often in js source these days as object assignment to the prototype can't even make use of it.

In general, JSpath should take the shortest path to an element. So don't do /[1]{0} to refer to the first object in an array literal ([]; [{x:1}];). Keep in mind that the object literal is still relative to the scope, so /{0} is the way to refer to the object.

Some examples. I'll try to show you all the JSpaths in some snippets, in order of occurrence:

Code:
function foo(){
var bar = function boo(){
return function(){};
};
}

// =>

/
/(0)
/foo
/foo[0]
/0/
/0/bar
/0/bar[0]
/0/(0)
/0/0/
/0/0/boo
/0/0/boo[0]
/0/0/(0)


Code:
var obj = {x: [1,2,3, {y: "hi"}]};
obj.z = 5;

// =>

/
/obj
/obj[0]
/{0}
/{0}.x
/[0]
/[0][0]
/$0
/[0][1]
/$1
/[0][2]
/$2
/{1}
/{1}.y
/'1
/"1
/obj[1]
/obj[1].z
/$3


Code:
[1,2,{'a':[x,y,z],'b':1,c:"hi"},4].forEach(function(n){ alert(n); });

// =>

/[0]
/[0][0]
/[0][1]
/[0][2]
/$0
/$1
/{0}
/{0}.a
/[1]
/[1][0]
/[1][1]
/[1][2]
/{0}.b
/$2
/{0}.c
/'0
/"0
/[0][3]
/$3
/[0].forEach
/(0)
/0/
/0/n
/0/n[0]
/alert
/alert[0]
/0/n[1]


Code:
function Foo(){}
Foo.prototype = {
bar: function(){}
};

// =>

/
/(0)
/Foo
/Foo[0]
/0/
/Foo[1]
/Foo[1].prototype
/Foo[1]#
/{0}
/{0}.bar
/(1)
/1/


Code:
try {
var foo;
} catch (e) {
var bar = e;
var e;
}

// =>

/
/foo
/foo[0]
/0!
/0!e
/0!e[0]
/bar
/bar[0]
/0!e[1]
/e
/e[0]

Now, this would probably need some work to be formed into a formal spec. But I'm not going there. I kind of doubt anyone will really use this system anyways. But if you find it useful and are going to use it in some project please tell me :)

Hope you liked it.