So I wrote an ECMAScript parser in JavaScript (which is ECMAScript). The parser should be able to parse any valid ES5 script. This includes proper regular expression detection and automatic semi-colon insertion mechanisms.
The script is not quite the fastest parser implementation, but it's easy to change the underlying language rules, or CFG.
You can find the parser at
http://esparser.qfox.nlIt uses a CFG parser I
blogged about yesterday, the unicode ranges I
blogged about a few months back and a lexer and core parser I will blog about in the
near future.
Features:
- Full (custom) ES5 parser (LL)
- Will tell you approximation where it stopped parsing
- Gives you extended look into how javascript is parsed
- Properly applies ASI
- Unicode character sets supported (for Identfiers, et.al.)
Known issues:
- Slow (parse time increases fast, but come on it's a full exhaustive search)
-
update (2010-12): currently parses jquery (78k) in chrome in about 25s
- Recursion call limit reached fast (due to nature of es5 spec, this happens fast with long comments)
- No unicode support for characters beyond the 0xffff range, can't parse them with js :)
Usage:
- Syntax highlighting
- Learning specification
- Checking ASI
- Checking operator precedence
- Compiler
- Validator
- Fun
I will write more about the problems of creating
this parser soon. But it's a very busy time for me now so for now just have fun with it :) Expect more soon...