logo for MegaDots Duxbury Systems, Inc. logo logo for MegaDots

Supplement 3: Using MSG Files to Control File Import

How Does MegaDots Import Tagged Files?

MegaDots has a program called MSGFILE.EXE to deal with a many forms of tagged files. If you encounter a file with radically different tags than those used on HTML files, you can still import these files if you are willing to learn about the tags, and be able to figure out what styles and attributes you want these to have in MegaDots.

The first application of this approach has been to import SGML files used by the IRS for their internal documentation. In a short period of time I was able to figure what the tags were, what they meant, and what MegaDots styles to change them into.

MSGFILE is a program you can run at the DOS command line. There are three parameters, the source data file, the target file, and an MSG file. An MSG file is sort of like a rules file. It tells MSGFILE what to do with the tags. If you add a fourth parameter (it does not matter what it is), then unknown tags are saved as hidden text.

The output file is an ASCII file which MegaDots will recognize as one containing MegaDots commands. If you export a MegaDots file to "ASCII line" with MegaDots markup, you get a file with a similar layout as the output file from MSGFILE.

The MSGFILE program ignores the DTD in your file. Changing the DTD will not affect how MSGFILE reads the data.

For example, if you have a file called JOHN.HTM. You can import it into MegaDots by typing MEGA JOHN.HTM <Enter>. Or you can import it with the following two commands (as long as you copy HTML.MSG into the current directory):


MSGFILE JOHN.HTM JOHN.TMP HTML.MSG <Enter>
MEGA JOHN.TMP <Enter>

The first line creates a MegaDots marked-up ASCII file called JOHN.TMP. The second line imports the temporary file.

A Practical Examination into HTML.MSG

Take a look at html.msg and/or nimas.msg in your MegaDots directory. These files control the importing of HTML files and NIMAS files into MegaDots. If you change the HTML.MSG file, you will change how MegaDots import HTML files. HTML.MSG is an ASCII textfile, and it can be examined, edited, and changed by any ASCII text editor.

The first line in the file contains the long name of the file type. The next line usually says "Style continuation: yes". The third line in blank.

The rest of the file is in a three column format. You need at least 2 spaces to separate the columns.

The first column is the tag (case is not important). The second column gives the MegaDots command. Here case is important. The third column comes from a restricted list of phrases that describes the kind of MegaDots operation.

Common MSG Commands

Unusual MSG Commands

Using TEXTCHK

A long time ago, I wrote a program called TEXTCHK. It was designed to help people write and read ICADD files (crude tagged files generated by book publishers primarily to meet the requirements of the state of Texas). This program can diagnose problems with tags. It can also fix quite a few problems.

TEXTCHK is used to help import ICADD files in MegaDots. It is a virtually undocumented program included with your copy of MegaDots. It is very handy when faced with a new tagged file and you need to understand it.

Just type TEXTCHK input output <Enter>. Here the input is a new file you need to analyze. The output is a report about the tag usage. You get a lot of useless information in the report. But you do get a tag census, a list of the tags and the number of times they show up in the text. The report will divide tags into "legal" and "illegal". Ignore this distinction. The program was written to look at how well a file measured up to the ICADD standard. We are interested in all the tags, we do not care if a tag is in the ICADD list or not.

Once you have a list of tags, you can search for them in the original tagged file to figure out what they mean. I have found that I can analyze a half a megabyte file in an hour or two and build an .MSG file for it.

Accented Letters

Many files contain markup like &aacute; for "an a acute". The MSGFILE.EXE knows about this and can handle these properly for MegaDots. This is hard coded and you do not have any way of adjusting these. If you have &xxx; markup which is not handled correctly, contact David Holladay.

Other Special Symbols

Lets say you have a file that contains &reallybigdash; (this is a code not recognized by the MSGFILE program. Lets say you want this to be a dash, which in MegaDots is represented by a double hyphen. Create a line in the .MSG that has the following three columns: &reallybigdash; -- text (the third column indicates that the code &reallybigdash; is being replaced with the double hyphen).

What does Excess Emphasis Have to Do with This?

If you have a tag that you do not do anything to in the .MSG file, it will show up in the "output" file. What happens to that tag in MegaDots? It is either thrown away, or it is kept as "hidden text". What is hidden text? It is text enclosed with the emphasis of "hidden". To create some hidden text, mark text as a block, and issue the control-F H command. This is not in the menu, so the hidden text is a hidden command within MegaDots. Hidden text does not show up in WYSIWYG. It does show up in show markup. It does not show up in any inkprint or braille output. It just lives there in the MegaDots file.

There is a very obscure question in the MegaDots preferences that influences how these tagged files are imported. Go to the "File Import" preferences, "default file". In that huge screen there is a question "excess emphasis". If you say "disallow", all unknown tags will be thrown away. If you say "allow", all unknown tags will be in the MegaDots file as hidden text.

If you export a MegaDots file with "hidden" tags to HTML, all the hidden tags are reconstructed as regular tags again. If you have need to import tags, mess around with them in MegaDots and then export them again, you will probably want to use this "hidden" feature of MegaDots.

UTF-8

UTF-8 is a system for encoding Unicode files in an 8 bit system. A UTF-8 is just like a regular ASCII file, except for its unusual way of encoding accented letters and specialized typesetting characters.

MegaDots can read UTF-8 textfiles as long as they start with the UTF-8 prefix of "EF BB BF" (three bytes expressed as hexadecimal). MegaDots can now read HTML/XML files that use UTF-8 character encoding as long as the file is identified in a tag as using "UTF-8" encoding.

If MegaDots finds a character that it cannot identify, it will convert the character into hexadecimal and save the character in an identifiable sequence. For example, the Unicode character "0816" is imported into MegaDots as ~[0816] (notice the added brackets and tilde). You can write a rules file to convert these into sequences that make the correct braille.

UTF-8 can be a useful intermediate file format. For example, we were e-mailed a textfile written in "code page 1250". We imported the file into Notepad, and found that Notepad was able to corrrectly work out the correct accent marks. Notepad is able to export to a UTF-8 textfile with the usual 3 byte identifier. MegaDots is able to import the UTF-8 file Notepad exported.

Putting this all Together

This is how I would approach a project:

User MSG File

If you create a USER.MSG file, you can force MegaDots to make use of it. Here are the different ways of forcing MegaDots to import a file, making use of the USER.MSG file: