On-line Leo III User Manuals

The following describes the rationale at the start of the digitisation of the manuals.

In due course the text will be updated to use the past tense.

The actual on-line manuals adhere very closely to this pattern, and should be comprehensible for decades to come. If you comb out the HTML tags, you get a text file which approximates very closely to the original manual — but without the cross-referencing, or the pictures (very few).

Rationale

What I am wanting to achieve is a simple searchable HTML document which renders looking a lot like the original, but not necessarily identically. In addition, on-line cross referencing makes the on-line manual easier to use than paper. We have this both for Computer Code actions in the form of the manual’s appendix B, and also for Intercode actions, where there is no equivalent manual section, so this cross reference is actually in the form of the interim Intercode Translator itself. Also, the contents for each manual has hot links.

I have deliberately gone for very simple use of HTML in the hope that it will not be sensitive to the emergence of new versions, and will render OK even in 50 years' time -- an assertion which I have no chance of testing :-). By deliberately not specifying fonts (except monospaced, as a result of <PRE>) the rendering can be to the reader's choice, and is not going to misfire owing to the absence of a particular font, e.g. my ubuntu laptop has different fonts from my Win98 system.

The stuff that comes out of word processors as HTML is very verbose and often invokes stuff related to the WP software in use. On the time scale that I am thinking about, the nature of screen and its resolution are likely (certain) to change, and the word processor software will certainly be different.

The stuff that we are producing even looks fairly good on my smartphone — at least until you try to zoom in, when the lines fall off the edges.

With Geoff’s diligence on Appendix A of Volume I, we now have all the fancy characters, but only by specifying the file as UTF-8. If we need fancy characters elsewhere, we may need to extend this technique, but for most parts of the manual, we can stick to ASCII and use little bits of HTML decoration to do the fancy bits.

Methodology

We made raw scans (see above), courtesy of Leeds University’s rather excellent sheet-feeding scanner.

These scans were given the OCR treatment using Nuance software to produce both a Word4Win document and a plain text document. I tried working from the W4W document, and found it impossible because it is continually getting uptight about bullet points, and making other assumptions that were wrongly assumed by the scanner. (I would like a scanner just to try to recognise the mono-spaced original and just create as ASCII file retaining the layout. If anyone has such a facility, please let me know.)

After a few experiments, We operated as follows:

I made .TXT files that preserve the original’s layout (unless it is just plain wrong), skipping past bits that involve tables, using a custom program. Volunteer editors then cleaned these up, keeping the layout, often using the W4W (or OO) spell checker as an aid. Underlines were ignored at this stage, and mathematical bits were flagged with two exclamation marks.

I wrote a formatting program to put in the underlines (most of them and a few extra duds), and create name anchors for the cross referencing.

The HTML file then went on the web server, and proof readers were invited.

I made images files for the diagrams and slotted them into the HTML.

The font you get is whatever you have your browser set to for monospaced text. I think that IE uses Courier, but I do not have IE, so I cannot check). The idea is for it to be almost identical to the .TXT file, making minimum necessary assumptions. I reckon it looks pretty much like the manual on my browser. I've put in the green lines for page boundaries, but not left lots of gap, as often the material flows from one page to the next, and the browser shows a better version than reading the actual printed manual.

Such simple HTML is likely to be renderable in 50 years' time, and if not, just looking at the text looks a lot like the original. I suppose that Bill Gates will introduce htmlx, but Tim Berners-Lee will keep HTML working.

The printed manual uses one and a half line spacing in lots of places. Insertion of blank lines has been done according to the taste of the particular editor.

Fancy bits.

Within preformatted text you can use various HTML tags. So far I have used: At the text stage, we just put in a couple of exclamation marks where there was mathematical fancy stuff (or any other fancy stuff). The tricky bits got sorted (mostly) at the HTML stage.

Tables

I tried using an HTML in section 3.2, but regtetted it. It did not always align with the caption, and I redid it in the style of section 4.3. At the text stage, I just laid out the table with no lines. There is a big table of numbers in Appendix A, which I intend to insert as an image from the scan. It is impossible to proof read it and there seems to be no value in having the numbers searchable.

How I made LeoCC.htm.

This was done from the OCR plain text file. Although its layout is only very approximate, I wrote a little program to deal with over-length lines, then used a spell checker, before going through it to put back the layout. I did not often bother to keep the line breaks in the original places. I also inserted little bits of HTML to put a thick green line between pages of the original. I just used my favourite plain text editor that I normally use for program source text. It works with notepad on Windows or with gedit on GNU/Linux. Put in a bit of HTML top-and-tail and you have a web page that looks a lot like the original manual.