Missing or Wrong Characters
Normally you can expect all characters in the document you import to show up in DBT correctly. This page tells you how to cope if you find a problem with one or more characters.
If you find that an imported Word document contains strange characters or drops out characters, we advise that you recheck the settings in the DBT Global: Import Options dialog, and then re-import your file. In this dialog, the option to look at is the Unknown Characters setting, which we advise you to set to "Output Unicode value". With that setting, all unrecognized characters are replaced by their Unicode values, so that you can take steps to replace them appropriately for braille.
For more details on the dialog (shown below), click here: Global: Import Options.
You will find after re-importing the file that any character which DBT does not recognize shows up as a Unicode value in an easy to find format, as in this hypothetical example:
Chapter One (U:25b6) Introduction
In this example, the Word file contains a variety of "bullet" character called the "Black right-pointing triangle". Actually DBT recognizes and supports this character, but we are using it as an example to show what you will see for an unsupported character.
To look at character codes in more detail, see Comprehensive List of Supported Characters. If you search this list, you will see that U+25B6, the "Black right-pointing triangle" is among the Geometric Shapes, the series of codes from U+25A0 to U+25FF.
Kinds of Characters Imported into DBT
- "Normal" - the lower ASCII character range (Unicode U+0020-U+007E). These characters pass through from the source file into DBT.
- Most Non-Roman Scripts - the character sets for Greek, Arabic, Cyrillic (Russian), Hindi, Thai, Bengali, etc. These character sets are handled by the DBT program.
- Han (Chinese, Japanese, and Korean) characters - during import, these scripts are processed through a series of scrub tables. These extra steps convert the Han character sequences to UTF-8, and then to the appropriate character set needed by DBT.
- Other Unicode sequences that convert to specific characters - are converted during import by the main UniMap file (unimap.txt). This file also deconstructs the Korean Hangul characters into their components.
- Other special conversions - are based on the use of specific fonts. These are handled by addtional UniMap files.
The complete machinery to import characters is complex. Nevertheless, some of the pieces are easy to understand. Here is a view into the process.
The "UniMap" Files
The examples below show how Unicode values, "normal" (lower ASCII) characters, and DBT codes are encoded in this file.
The following three lines from unimap.txt are examples of mapping a non-preferred Unicode character into the preferred Unicode character. The three elements on each line are the character code in, the character code out, and a "comment" field. The vertical bar is the start of the "comment" field, a "notes" field which the computer ignores. The unimap.txt file is itself in UTF-8 format, which is capable of displaying a wide range of Unicode characters, so that the comment can show the conversion:
U+0089: {U+2030} | -> ‰
U+008A: {U+0160} | -> Š
U+00B5: {U+03BC} | micro -> μ
The Unicode character U+2654 is a White King chess symbol. This conversion needs to be encoded using DBT command codes:
U+2654: [q~@$<][i]white [i]chess [i]king[q~>] | ♔
In another UniMap file, welsh.txt, there are conversions for the Welsh "Afallon" font. The example below shows the conversion for the small w with dieresis, whose (incoming) code is the hexadecimal value 00be. The mapping is:
U+00BE: {U+1E85} | ¾ -> ẅ
As a final example, in the vietnamese.txt UniMap file, there are conversions for the Vietnamese "VNI-Times" font. The code value 00e4 is for a "combining" character that includes a circumflex accent and a dot below (tone mark). The appropriate mapping (one character code in, two character codes out) is:
U+00E4: {U+0302}{U+0323} | ä -> ̣̂