How strcut saved me

2015-01-19

Yesterday I accidentally discovered PHP has a function called mb_strcut, which is a substr with byte offets but which floors them to multi-byte character offsets. Now frankly, I can't think of many proper use cases for this. Either you treat the content as binary or as text. But I'm sure there are some reasons for needing it. And at least I found one.

Last year I introduced a new compo to JS1k which allowed 2k demos. It wasn't a huge success it take a few demos. Now while rewriting the details pages for all demos I wanted to mark the excess of 1k in a different color. Take for example this one.

When displayed as plaintext, the demo sources are obviously html encoded. But in order to wrap part of the source in a tag I needed to slice off 1024 bytes, html encode them, and then take the rest and html encode that as well. The second part then gets wrapped in a <span> so I can apply a color to it. On the page it will look and behave like a single text.

The danger here? If you cut in the middle of a multi-byte character you may end up with two characters that shouldn't be there at all, right before and after the span opening tag. There's no security risk involved, I think, since each part is individually escaped after the slice. But still...

This is exactly what mb_strcut helps you with. You can give an offset and length in bytes and it will floor that number to the offset of the character the offset ends up inside. This does mean mb_strcut may not return me an exact number of bytes, but at least it won't clobber anything. I suppose it will give you a perhaps unexpected result when trying to cut the middle byte of a three byte character, but you should be aware of such cases anyways when using the mb_* functions.

I wonder if there's a similar function slated for ES6...