The natural representation of text in JavaScript is as 16-bit, unsigned integers. Any algorithm dealing with the numbers directly must access them through the String.charCodeAt method, leading to a huge drop in performance compared to languages with direct numeric string access.

function binl(s) {
var l=[], i;
for (i=0; i<s.length; i++)
l[i] = s.charCodeAt(i);
return l;
}

For neatness, most applications will use a conversion function similar to the one shown, which will return an array of integers corresponding to the string values of a string s. The function shown allows values of all the 16-bit string range without discrimination. A faster implementation of JavaScript might allow access to the 8-bit components of these numbers; however, such an technique violently hinders performance if written in JavaScript itself. If strict 8-bit output is desired, one might mask the 16-bit values with the additional instruction, "l[i] &= 255." (Note that curly braces would be required to add an additional line explicitly within the loop.)

JavaScript also takes string concatenation with an exponential curve in duplication versus time. Every concatenation operation in fact requires the generation of three new strings. If a contains "bar" and b contains "foo," then a += b will perform a duplication of a and b into temporary variables, a concatenation of these into a single string, and the replacement of a with this new string. (It would seem that the third temporary string is unnecessary, as one might write the concatenation directly into a. This is not possible, however, since, in the general case, the rule holds that "every expression is also a value." However, JavaScript may still be improved by allowing the third temporary value to simply "move over" the variable a.)

function sx(a, x) {
var s = '';
while (x-- > 0) s += a;
return s;
}

An ignorant implementation of the string repetition function, such as shown, will run headlong into this issue. For every repeated value of the string a, there is a cubic impact on the time--and since s is constantly growing, every iteration will hold an even greater cost in time. A recursive definition of the function is much better, with recursion beginning at the third step, following the principle of squared sums.

function sx(a, x) {
var s = a, t;
for (t=2; t<=x; t*=2) s += s;
if ((t=x%(t/2)) > 0) s += sx(a, t);
return s;
}

This function is defined with the exception x != 0. (I do not deem it necessary to insert a logical clause for this case.)


Now then--using these functions, we will evaluate the performance for the binl() function. How long will a string be before JavaScript bogs down? How short can it be and still be efficient? What is the very best size? The answers depend partly on the scripting engine, and partly on JavaScript/ECMA-Script itself. For fun, we will first compare the two sx() functions. (We're going to be using the timer() from timer.js.)

function timer(f) {
return function(x) {
var n1 = new Date();
f(x); // do something here
return new Date() - n1;
};
}

function sx(a, x) {
var s = '';
while (x-- > 0) s += a;
return s;
}

var timeit = timer(function(x){sx('-', x)});
var strsiz, total, i;

for (strsiz=4; strsiz<Math.pow(4,6)+1; strsiz*=4) {
total = 0;
for (i=0; i<10; i++) total += timeit(strsiz);
document.write('<li>', strsiz*10, ' bytes, ',
total + ' ms, ', (strsiz*10*1000/total),
' B/s.', '</li>\n');
}

For the first sx() function:

Of course, an especially bad browser will only compound the problem. Internet Explorer on Macintosh has notoriously troublesome string operations. When I tried the test, version 5.2 on Mac OS 10.1.5 went entirely white. (Windows or Mac OS 9 would have been plummeted to the gray sea of crashdom.) The only ones it could finish were the first seven.

It looks like the better browser (in this case, Chimera 0.5) has an optimum size of only 256 bytes at a time. (We run ten trials to get the total time and average speed--the 2560 bytes comes from ten runs of 256 bytes each.) The worse browser peaks at only 64 bytes, and both dwindle rapidly as the length increases.

(Try it yourself: text card 1)

Quite the opposite is true with the second sx() function.

In fact, it looks like it got off to a rather rough start--for the first 6 tests you get nothing more definite than a random bounce. It is good to note that the second function does not exceed the speed of the first on very small strings, nor did we expect it to--theoretically, the first should be slightly quicker because it does not have as complicated a job--but that "benefit" only applies to very insignificant samples.

The same thing for Internet Explorer:

Now, that looks far better!

(Try it yourself: text card 2)


Ahh, let's look at the binl() function now--well, after a moment--the lines mean take a deep breath and have some pizza. :-)

I'm going to use the same timing code, and almost the same setup--this time I will omit the 8th trial. (And of course, I will use the faster sx() function to create the test strings used by binl().)

The Macintosh Internet Explorer fares even worse with long strings.

(Try it yourself: text card 3)

Well, you can see how very slow this process is. In just about any algorithm--be it encryption, or what have you--this "binarizing" conversion step will have the heaviest cost.

What we can conclude is that the maximum length of a string in binary operations should be between 64 (IE) and 256 (Chimera/Mozilla/Netscape). Both these browsers and others are likely to continue to display these characteristics--it is almost built into JavaScript. If possible, program designers should craft their algorithms to process binary parts only in small-size pieces.