Deferred document.write

2014-03-08

The most difficult client-side problem to fix at Surfly is without doubt syncing document.write. There are so many idiosyncracies regarding docwrite that it should come as no surprise that use of it is highly discouraged. However, for Surfly we have to simluate it in order to sync it's side effects. This leads to rather interesting but very hard to solve problems. And if you try to ask anybody anything related to it, you'll have to go through a mandatory set of "WHYYYY ARE YOU USING DOCWRITE????" responses first. Even from people that should know me better than that. But one can never hear enough of that...

TL;DR? document.write isn't always a synchronous action and can lead to very strange script threading behavior. I'll describe the problem and explain how I'm working around this as part of simulating it for/under Surfly.

By nature, document.write is a synchronous operation. You call it, it writes the arguments. It actually interprets the string you pass on as if it was injected into the static html. Well, not really, but almost. This of course enables you to do crazy stuff, all in good fun I'm sure.

Since you can docwrite any html, as a whole or a in parts, you can of course also write new script tags. As it happens, once you complete a script tag it is immediately executed. This is at least one oddity in the scripting model because it allows one script to actually pause the thread of another script. Something that's not otherwise possible in the ES5 world.

Code:
console.log("a");
document.write('<script>console.log("b");<\/script>');
console.log("c");
// => abc

The written script is unwound first and afterwards execution of the old script continues. The inception case of course also counts. A docwrite that writes a script that docwrites another script etc.

Code:
console.log("a");
document.write(
'console.log("b");'+
'document.write('<script>console.log("c");<\/script>');'+
'console.log("d");'
);
console.log("e");
// => abcde

Now this part we can easily sync. We can hook into the call and determine the target element where the write will happen, up to some degree anyways. The DOM will be up to date regardless of these docwrites or other dynamic alterations.

However. There exists a case where docwrite is actually asynchronous. In fact, there exists a case where all future docwrites are blocked while the invoking script continues. This is where it gets a bit dirty.

If you docwrite an external script tag it will block all future docwrites until the resource is fetched. This will not block the currently invoking script, though. Only the docwrite calls are deferred. When the writes do happen, they happen as if you called document.write right there and then.

Code:
console.log("a");
document.write('<script src=foo><\/script>');
document.write('<script>console.log("b");<\/script>');
console.log("c");
// => acb (!)

It doesn't really matter what the link is, it will block until the browser resolves the resource by either downloading it or by concluding that this won't happen. This actually blocks the html parser too, so static content that follows it won't be shown until this point either.

Once docwrite unblocks, which will always happen after the script that invoked it in the first place, it will resolve the docwrites as if they were called right then. If one of these deferred docwrites writes an external script tag, it will block docwrite again. That includes any deferred writes that were left on the queue or even content that followed an external tag in the same docwrite call.

Code:
document.write('A<script src=foo><\/script>B');
// A shows, B only shows after foo resolves

Of course, the same goes for static html after the script tag.

Code:
<script>
document.write('<script src=foo><\/script>');
<\/script>
B
// B won't show until much later when foo loads or errors

Our problem is that we have to figure out which element is the "active node", the one where the docwrite will actually start writing when you call it. We can do a fairly good job at this with regular docwrites. It's hard to get it perfect because you simply do not know whether a tag is currently opened or closed. But that's a different story.

These async docwrites introduce a new problem though. Because when the function is called, the DOM could very well not be the same as when the writing actually happens. This will lead us to target the wrong element as the parent and screw up syncing. So how can we solve this?

First step is to parse everything that is docwritten. This sounds complicated, but really isn't at this point. While parsing html can be pretty complex, we at least know that static html does not interfere with the process because every code that executes originates from a <script> tag. So when code executes, we'll know the last static html was the closing script tag.

So we'll skip the "html is hard" part. This is solved by a good parser and outside of the scope of this post. We can process all docwrites and search for an external script tag (that is closed). When we encounter it, we can put all the other writes on a stack, which is pretty much what happens anyways.

When would you unwind the stack? We could try window.onload, but this has two problems. 1: you destroy the document because it will have been closed at the time of onload, and 2: if there's another docwrite that blocks you won't get another onload so then what would you wait for.

We could write a script tag after the external script, but that leaves us with an extra element in the DOM which the user code might not expect. We could clean that up but this gets messy quickly.

The only viable alternative is to use the dom0 onerror and onload handlers of the external script tags that are docwritten.

Code:
document.write('<script src="foo"><\/script>');
// =>
document.write('<script src="foo" onload='unblockDocwrite();' onerror='unblockDocwrite();'><\/script>');

This works but you have to be careful not to overwrite existing dom0 handlers. Luckily you can easily and safely prefix a function call as long as you end with a semi and no whitespace after that. HTML being forgiving actually helps in this case.

It does have bigger problem though; if the external script does any docwrites these should happen before buffered docwrites of the outer script. But since you wait for that script's onload you won't unblock the queue until the onload/onerror handlers are fired. And they fire after the external script runs. As a result you would be queuing the docwrites from the external script tag to be flushed AFTER the currently buffered ones, which is the wrong order. Note that this would also be a problem for appending an extra script tag to unblock because it would also run this code after the external script tag. Code example:

Code:
// foo.js:

document.write('A');

// html:

document.write('<script src="foo.js"><\/script>');
document.write('B');

// output should be: AB

We somehow have to know that the external script tag has resolved before the script actually gets executed. This is a tricky one because we must do so while blocking but without executing code. And scripts are in this context the only way to load resources synchronously. We can't even use <link> here (covered below) because it won't actually block docwrite.

We could write a marker at the end of each inline script tag that would clear the blocking flag. While this might not be true right there and then, it'd still work because the first possible docwrite from that point onwards will originate from the external script. This still leaves us with an extra script element that we have to deal with somehow. It also won't get us around back-to-back docwrite blocking, when two consecutive docwrites cause docwrite to block:

Code:
<script>
document.write('<script src=foo.js><\/script>');
document.write('<script src=bar.js><\/script>');
document.write('A');
</script>
<script>unblock();</script>

Now we'll unblock for foo.js but not for bar.js :(

Super inception case: we have a root script that does two docwrites, the first one blocks. The external resource does something similar and the second external resource does another write. What would be the order? Outer to inner.

Code:
// external1:

document.write('<script src="external2"><\/script>');
document.write('B');
document.write('C');

// external2:
document.write('A');

// html:
document.write('<script src="external1"><\/script>');
document.write('D');
document.write('E');

// output: ABCDE

Oh and in case you think this is an artificial case; you're wrong. Advertisement networks pull all kinds of crazy shit to try and ensure that their content is delivered. Such crazy :( Note that this usually happens implicitly. Including a couple of ad networks on the same page consecutively can easily get you into this kind of trouble.

So the above means that if the extenal resource causes a docwrite block, these docwrites should be queued before all the other buffered docwrites but still in order of appearance. If we only added an unblock at the end of a script tag then the docwrite for C would still appear at the end of the queue rather than the start.

There appears to be no event that fires between the loading of a script and actually executing it. IE has onreadystatechange, but none of the others do. This means we can't really set a flag or fix the order before executing the external script. All we have a hook right after the external script finishes. Too late.

One way of circumventing this problem is to load a proxy script. This script would in turn, synchronously, docwrite the actual script after clearing the docwrite block flag. Obviously, the downside is making two requests for one script.

Code:
document.write('<script src="foo"><\/script>');
// =>
document.write('<script src="proxy.js?foo"><\/script>');

This is a solution that almost works for us because we're already on a proxy. And while we do try to preserve cachability as much as possible, we do tend to break things every now and then. Although one could argue that we could use this proxy script for any remote resource, therefor preserving cachability again. It won't actually work because we can't be certain that the fetched script will actually be JavaScript.

A variation to this workaround is to have the proxy file actually return the requested file. You could pull this off without sacraficing caching by consistently doing this for any external js file.

However, in our case we're not always sure about the contents of a script file. There are some shoddy rules when a script tag is actually interpreted as JavaScript and when it's not. The type attribute obviously plays an important role and there are various values for this attribute alone that are inconsistently acceptable for being JS.

This means we can't just use either of these solutions because we can never be certain that some file is indeed JavaScript, and not some resource like JSON or a template being loaded in a crazy way (this happens).

I'm running out of ideas actually. Can't use events. Can't re/write trailing new tags. Can't safely rewrite the remote resource. Can we perhaps rewrite the whole external script part in the docwrite? Change it to an inline script and somehow gain control that way? Yes. Maybe.

We could replace the script tag with a custom inline script that synchronously fetches the JavaScript through AJAX and invokes it.

Code:
document.write('<script src="foo"><\/script>');
// =>
document.write(
'<script>'+
'var script = GET('proxy?foo');'+
'// aaand...'+
'<\/script>'
);

And then what? Docwrite an inline script with the fetched contents? Won't be generic; there are differences in an inline script versus an external script. If nothing else, there are events tied to the script element that would not fire for inline code. What's worse is that there would be two script tags now, since our custom element would be part of the DOM now.

Similarly, doing innerHTML, updating the textContent of an existing script element, or injecting a new script element by DOM API is not going to cut it. Eval will change the semantics too much to be a reliable replacement for this. It's almost a pity we can't just document.write and extend the current script tag with new script contents ;) Almost. And also the script tag has to be closed by definition in order to be executed anyways.

(If you know a bit about Surfly you'd know that we could do anything like this on the server side anyways, the problem is not so much in accessing the script, the problem is knowing whether it's a script that the browser will execute in the first place. So the last bit was mainly a thought experiment.)

There is another possibility. One who's problem surface wrt data resource hacks is probably negible. If we pass on a flag to the proxy through url parameters, we could instruct the proxy to rewrite the JavaScript in such a way that the first document.write call in the fetched script would unset the flag. This is different from simply prepending such a mechanism to the script because it reduces the risk of modifying data resources. If the script can't be parsed, nothing is updated. If the script contains no document.write, like a JSONP file, nothing is updated.

It's of course possible for docwrite to be invoked without explicitly using document.write. To this end you could extend the above to add an argument to the first proxy call or whatever. Or, worst case, you could decide to always prepend the mini bootstrap but only if the document was parsed at all, and perhaps only if it actually contained any calls. You could think of additional heuristics to reduce the error surface.

Okay, so let's look past this. Let's say the proxy is able to write some call that would reset the docwrite block flag. How would you deal with the queueing of new scripts? Because as we demonstrated, deferred docwrites inside the external script would still fire before deferred docwrites in the static html. Recursively true for other externals.

If we keep just one queue and push any deferred docwrites onto it, we'll end up with the wrong order. In the last real example we'd be getting DE BC A.

We'll have to create two queues instead. One queue acts as we had it already. When new docwrites are deferred, they are pushed on this stack. The other queue will have "stale" docwrites, from contexts that are finished. In the above example the static script tag would first add two docwrites (D and E) to the main queue. The end of the script is reached and the script thread will yield, wait for the external resource to load. When it loads, it will first call our prepended mini bootstrap which pushes the main queue to the front of the stale queue. It then proceeds to add docwrites for B and C to the main queue before the script thread yields again. The inner most external script, external2, will again do the mini bootstrap. So the main queue is empty and the stale queue is now BCDE, because BC are pushed to it in order, but in front of the existing DE. The inner most script will write A and then the onload event will flush both queues. First the main queue, which is empty, and then the stale queue. This will cause the proper order of ABCDE to be printed as a result.

If the external script was never loaded because of a 404 or any other reason, the main queue will still be flushed before the stale queue, so the order is still preserved properly.

Ok this should cover script tags. But there are more tags that block in static html, and if so, what about them? I'm glad you asked! Yes, and they're not a problem. Not our problem anyways.

The only other thing that blocks anything are obviously style sheets. These are actually kind of weird even in the context of docwrite. Normally an external stylesheet blocks script tags if it encounters them. Not content though, so it's a bit more lenient than scripts in that regard. The reason is of course that it won't matter for "dumb" content. Scripts might need to know certain visual aspects about the DOM though.

If you docwrite an external stylesheet it actually partially blocks. The tag will be written and the element created as usual, initiating the fetch. Your current scripting thread will still proceed to execute though. So this is actually a state that browsers have been trying to prevent by making external stylesheets block. Weird.

What's weirder, although kind of rational in a way, is that while this script will proceed to run, any future scripts will not and are still blocked by the external stylesheet. The exception is of course docwritten inline script tags.

Code:
console.log(1);
document.write('<link href=foo.css rel=stylesheet>');
console.log(2);
document.write('<script>console.log(3);<\/script>');
console.log(4);
// => 1234

The absolutely screwed up part about this is that docwrites are only visually blocked. That is, the writes happen but they don't show up visually. But when you query innerHTML, they are there. That's pretty weird and undoubtedly a mechanism that tries to prevent "flashes" as much as possible. I'm not sure if you'll succeed in that either way though. This will also cause a flash if the stylesheet is delayed long enough.

For us, this doesn't pose a problem. The docwrites still happen at the time of calling the function.

Code:
document.write('<link href="foo.css" rel="stylesheet">');
document.write('foo');
assert(innerHTML.body, 'foo'); // true

And the same for writing inline script tags btw.

Code:
document.write('<link href="foo.css" rel="stylesheet">');
document.write('&script>document.write(\'foo\');<\/script>');
assert(innerHTML.body, 'foo'); // true

(Note that an inline style that uses @import exhibits the same behavior)

So for us, this is not a problem.

There are no other tags that might block docwrite. If there are I would really like to know.

I think that catches everything regarding deferred docwrites. Wasn't that fun?

I've only covered docwrites originating from script tags while the document is loading. There are other oddities from events while a page is loading, or when docwriting when the document is "closed". But I'll save that bed time story for another night.

If you can dig these sort of problems you should totally look for a job at Surfly, we try to solve these kind of problems on a daily basis.

I'm not sure if many people will actually read this all the way through, and still get what I've tried to explain. But I hope you learned something from it regardless :) Now, let's get back to 2014.