Convert double quoted strings

2015-08-25

to single quoted strings in JS source code, generically. For a project I wanted to convert double quoted strings in the JavaScript source code, containing a whole bunch of tests, such that they were all single quoted strings. For consistency and because I just mostly prefer single quoted strings in my JS source code. The file wasn't consistent because I copied parts of it from the JSON.stringify result from certain automated pages and this of course only uses double quotes. Since these concerned CSS test cases, I had to take care not to mangle escaped quotes in the process.

To be specific; I wanted to convert the double strings in this file with test cases (mixed single and double quoted strings) to single quotes like this version (single quoted strings only).

All things aside you may wonder why that's so difficult. Well let's first draw the scene a bit. This is the kind of source code we are working with:

Code:
var value = ["foo", "'bar", "\\'baz"];

Now one difficulty in this blog post is to make you realize that when I say "change this string" I mean a string that contains part of some source code, so including the quotes that make up a string. It is notoriously difficult when adding backslashes into the mix. For example, to change the string "foo" to 'foo', we have to consider the quotes as well. Cut them away and replace them with single quotes. It's very cumbersome because on screen you're working with strings that look like "foo" in the source code, but to get them like that your own source is something like '"foo"' or "\"foo\"". The question "What happens to the backslashes" when looking at related code is a super pain in the ass on its own.

Anyways. To convert "foo" to 'foo' is simple. Replace the quotes.

To convert "'foo" is relatively simple as well. You can do a simple string replacement like this:

Code:
value.replace(/'/g, '\\\'')

// or

'"\'foo"'.replace(/'/g, '\\\'')

// or

"\"'foo\"".replace(/'/g, '\\\'')

// (all the same)

And this will work fine, as you'd expect. But now consider the example where you have a string that contains an escaped quote: "foo\'bar". And in an attempt to squash confusion, this is what you'd type in your console to get this value: '"foo\\\'bar"'. Yeahhhh. Now, quiz question: does the above approach still hold?

The answer should be implied; absolutely not. You'll end up adding another level of backslash to the quote, but since there's already a backslash you'll now have created a double backslash (which is legal) and leave the quote unescaped (which is not legal). In effect you've created a syntax error. So if this was your file:

Code:
var = "foo\'bar";

Then after applying the fix above, you'd end up with this file:

Code:
var = 'foo\\'bar';

And that'll throw an error. I hope that clarifies the initial problem.

Now you may be tempted to think: I'll solve that with a regular expression. Easy. Well, no. Sure it's easy to search for a single quote. It's even fairly easy, though less trivial, to search for a single quote that has either no backslashes or a double backslash preceding it. You can even take start and end into account. It's actually doable to detect whether any number of backslashes are even or uneven, in order to detect whether you should inject an extra backslash. The problem is that you need this to be "global", to match multiple times in the same hay, and that's where shit hits fans. Your regex is bound to start the match mid-way within the consecutive backslashes and screw up.

In short; I'm not convinced it's impossible to solve this problem with a regular expression; I just decided that doing it in a loop would allow me to finish with this task quicker. And the result would probably be faster too. Regular expressions are relatively super slow in JS. Doing manual string manipulation can save you. Though on this scale it's probably irrelevant.

Anyways. Here's the script to transform a string containing a double quoted string as source code to a string containing a single quoted string as source code, while properly escaping single quotes and ignoring already escaped single quotes. Oh I can already see the crowd go buck wild when doing a talk with sentences like that; it's the realtime recompilation of running JavaScript talk(s) all over again...

Code:
// `value` should be a double quoted string, including the quotes
// we are going to change it to a single quoted string
// to accomplish that we must escape all _unescaped_ single quotes
var value = "foo'bar";
// if the string did not contain a single quote, it's very simple
if (value.indexOf('\'') < 0) {
t.value = '\'' + t.value.slice(1, -1) + '\'';
} else if (value.indexOf('\'') < 0) {
// no backslashes so trivial replace is simple. this optimization may actually hurt perf :)
t.value = '\'' + t.value.slice(1, -1).replace(/'/g, '\\\'') + '\'';
} else {
// string contains single quotes and backslashes (we could
// rule out the backslash-quote combo too, but whatever)
// get the string value; cut away the outer double quotes
var inp = t.value.slice(1, -1);
// collect each character step by step. this is "slow" but probably fine for our purpose
var out = '';
// loop through the string value, search for single quotes
for (var i = 0; i < inp.length; ++i) {
if (inp[i ] === '\'') {
// check if the quote is escaped
// this is true if it's preceded by an _uneven_ number of backslashes, or none
var j = i - 1;
// check two characters and go back as long as they're both backslashes
while (j > 0 && inp[j] === '\\' && inp[j - 1] === '\\') j -= 2;
// since the previous two characters are not both backslashes, the quote is
// escaped if the second of the two is not a backslash. in that case _add_ a backslash
if (inp[j] !== '\\') {
out += '\\';
}
}
out += inp[i ];
}
value = '\'' + out + '\'';
}
// value should now contain source code that is a single quoted string with
// every single quote properly escaped without changing the resulting value
// if both input and output were evalled and compared.

Here's an input file and expected result when you apply the algorithm above to each string of it.

Code:
var strings = [
"",
"'",
"\'",
"\\'",
"\\\'",
"\\\\'",
"\\\\\'",
"\\\\\\'",
"a",
"a'",
"a\'",
"a\\'",
"a\\\'",
"a\\\\'",
"a\\\\\'",
"a\\\\\\'",
"'a",
"\'a",
"\\'a",
"\\\'a",
"\\\\'a",
"\\\\\'a",
"\\\\\\'a",
"a'a",
"a\'a",
"a\\'a",
"a\\\'a",
"a\\\\'a",
"a\\\\\'a",
"a\\\\\\'a",
];

->

Code:
var strings = [
'',
'\'',
'\'',
'\\\'',
'\\\'',
'\\\\\'',
'\\\\\'',
'\\\\\\\'',
'a',
'a\'',
'a\'',
'a\\\'',
'a\\\'',
'a\\\\\'',
'a\\\\\'',
'a\\\\\\\'',
'\'a',
'\'a',
'\\\'a',
'\\\'a',
'\\\\\'a',
'\\\\\'a',
'\\\\\\\'a',
'a\'a',
'a\'a',
'a\\\'a',
'a\\\'a',
'a\\\\\'a',
'a\\\\\'a',
'a\\\\\\\'a',
];

I've developed a tool that makes it very easy to find and select all the double quoted strings in a script in such a way as you'd use string.replace(). That's what I used combined with the above to search on {STRING & /^"/} and get this diff as a result :)

But more on that tool later.