Joseph K. Myers

Tuesday, May 21, 2002

Wrap

(Wrapping the way it's supposed to be.)

A little blab:

The wrapping of text is one of those _irritating_ problems. It is so inhuman to devote time to programming an operation to which the solution is so obvious to ourselves, but so intranslatable into computer code!

And worse, the question of "How to wrap the Right Way" is still one where heated disagreement lies. I know, because I almost disagreed with myself! (For more about wrapping, see ../reference/writing-and-wrapping.txt.)

I had to define the mythical Right Thing:

(1.) Linebreaks only replace whitespace. Therefore, an 80-character non-white-space URL will not be "wrapped" (e.g., broken at 72 columns--which just happens to render it unusable). Neither will a long wordle consisting of some-big-long-crazy-thing-taking-up-far-too-much-space-....

(2.) Inserted linebreaks belong as close as possible to the right margin. Well--obviously.

(3.) Extra space is not "collapsed." (By the way, this is another reason not to do double spacing (double indentation is fine).)

By following these rules, we do not change text, we merely reformat it. Isn't that all wrapping is supposed to do, anyway? As an added benefit, you can see that the total size of text wrapped according to such an algorithm is unchanged--which is marvelous and good, to say the least.

I've said enough about that. At least I hope so.

Other details and facts:

Wrap uses buffers on a basis similar to rr (see rr.txt).

In short, wrap reads a buffer, and then loops (with the inestimable "for"; if you don't get that, you're not a programmer--sorry) through what it finished reading in order to read it again. During this second reading, it keeps track of lines and spaces, and when a line grows too long (that is, past the maximum allowed margin), an old Whitespace Character (if present) is exchanged for a nice new Line Feed.

And thus, a new line is born. And so on, and so forth.

Don't get WooCed by the WCs, or LooFed by the LFs! :-) (Sorry, there are hardly any jokes on this topic, so I just had to invent one.)

Naming:

Well, duh! It's wrap; that's its name, and that's what it does. Try it and see!

Download:

wrap.tar.gz (644 bytes)

As usual, as typical, as an idiot, and for your laziness, "a Makefile is provided." It is simple. Install the program after "make"ing into /usr/local/bin, or something that suits your fancy.

Usage is:

wrap [cols] [< input] [> output]

The optional argument cols is used to indicate a margin, if different from the default of 72. The maximum value of cols is 999, and of course, don't use a negative number.

Performance is not so phenomenal, but probably faster than anything else. (Yeah, easy for me to say.)

At any rate (pun not *necessarily* intended), wrap wraps at a rate of 10 MB of random garbage per 0.380 (or later, 0.370) sec. (Findings may reflect a one-time event.)

I don't even use the <stdio.h> library.

Testing:

Since a known "good" file must be used in order to verify proper wrapping, no test is included in the standard "make" procedure. However, you may easily download an additional file, wrap-tests.tar.gz (9044 bytes), and perform a test according to the included instructions.

A techie note:

When writing this code, I needed to write to the middle of a buffer. I had trouble finding out how. This is the way:

read(fd, &buf[i], l); // write l bytes to buf, starting at i

Where fd is a file-descriptor, buf is a char[], and i, l are int-egers.

Another strange thought on performance:

It appears that we must write out alternate buffer sizes, one of which may be less than cols characters long. Since the minimum buffer *must* contain at least cols characters in order for wrapping to be feasible, and the default size of the buffer is 4096, splitting it in two gives at worst a performance comparable to input/output with a smaller buffer size of 2048, regardless of the "hammering" action (in the case of faintly-possible 1-byte read-write) which is so disturbing to some less thoughtful book writers.

A conspiracy theory:

The wrap.c code used to be 666 bytes.

(But now it's less.)

http://www.myersdaily.org/joseph/unix/wrap.txt