WARNING: This is a highly technical topic relating to strange characters which may appear in DBT after importing a Word file. You will require an understanding of Unicode and how Windows characters are coded.
If you find that an imported Word document contains strange characters, we would advise re-importing the Word file, having first ensured that DBT's Global: Word Importer is set to produce the "Output Unicode value" of "Unknown characters:". (Shown below)
You will now find after importing the file that any character which DBT does not recognize, might appear as follows:
Chapter One (U:25b6) Introduction
In this example, the Word file contains a bullet type character called, "Black right-pointing triangle". With over 40,000 possible Unicode characters, and in a table which is constantly being added to, it is impossible to map all these to Braille in DBT.
The following explains how such unusual characters may be mapped to braille characters.
Duxbury Systems, Inc.
November 30, 2001
The wrduni.txt file controls the mapping of Unicode and other specialized font characters into DUSCI characters when the Duxbury Braille Translator (DBT) imports a Microsoft Word file.
Unicode is an international character encoding standard; see http://www.unicode.org on the World Wide Web for details.
Within the Windows system, specially encoded single-byte fonts may also be used instead of double-byte Unicode for characters that cannot be expressed within the Windows standard single-byte (Latin-1) font.
DUSCI is the internal multi-byte character encoding standard that is used within DBT. It is based upon Unicode but the encoding method is different. Whereas Unicode characters are always two bytes in length, DUSCI characters may be 1 or 2 bytes in length (and may theoretically be extended to 3 or more bytes if necessary to accommodate more characters in the future). The complete listing of currently assigned DUSCI characters codes, together with the corresponding Unicode code values, is given in the "Character List" document, under "Help" in DBT.
The wrduni.txt file itself is a simple "ASCII text file." WordPad, or any other editor which can naturally edit plain-text files, can be used to edit the file. When finished, be sure to save it back as plain text, not as a WordPad file nor in the format of Word nor any other word-processor program.
The file consists of a set of "sections," each section corresponding to the first byte of the Unicode value being mapped or a set of special font names.
In the first case, that is when mapping Unicode values, the section is headed by a line containing an asterisk and the initial byte value in hexadecimal, for example:
*1e
precedes the line(s) detailing the mappings for all Unicode values whose first byte has hexadecimal value 1e.
In the second case, that is when mapping special single-byte font values, the section is headed by a line containing "*00:" and then a list of the font names that follow the same mapping. If there is more than one font name, they are separated by vertical bars (|), for example:
*00: Afallon|Cwrwgl|Heledd|Padarn|Teifryn
would head a section detailing mappings for certain Welsh fonts -- namely Afallon, Cwrwgl, Heledd, Padarn, and Teifryn. Note that the font name(s) must be spelled exactly as they appear in the system font list, including capitalization and any punctuation that is part of the name.
Following the last section only, there should be a line containing just a single asterisk; this line marks the end of the file.
Each line within a section gives the mapping for a single imported character. The mapping may yield one or several characters in DUSCI.
In the case of Unicode characters, the line begins with the value of the second byte of the imported character, in hexadecimal, followed by a colon and a space. Recall that the value of the first byte is given by the header line for the current section.
The mapped-to value or values then follow, either by giving the character(s) directly (if such characters are ASCII characters other than a vertical bar [|]) or by giving the code sequence expressed as three-digit decimal values each preceded by a vertical bar.
In the case of special single-byte font characters, the line begins with the hexadecimal code value, a colon and a space. The mapped-to value(s) are then expressed in the same manner as for Unicode characters. Note that any unmapped characters are treated the same as if they were in the Windows standard (Latin-1) font, which corresponds to the first page (i.e. section "*00") of Unicode. That means it is necessary only to map those characters that are encoded differently from the standard font.
Some examples of detail mapping lines follow: