A Code Page is a widely-recognized system of matching characters and other symbols with the numbers that represent them in the computer. Modern computer systems now have a code page (called Unicode) that includes virtually all characters in use. But before Unicode, the set of characters in a code page was limited to 256 characters. The solution then was to allow multiple Code Pages, each with a different group of characters.
The history and proliferation of code pages is a mirror of the development of computers. One of the first code pages, EBCDIC, was invented by IBM. This code page is now out of use. Then ASCII was a standard, but only for the first 128 characters. Many code pages use the same 128 characters at the beginning (called the low bit characters), but then vary extensively on the next 128 characters (called the high bit characters).
After ASCII was introduced, IBM created a new set of code pages. Originally Microsoft co-operated with IBM, and they used identical code pages. In the 1990's, these two companies stopped co-operating, and Microsoft made its own version of key code pages. Before Unicode, various Asian nations also set up their own systems for handling their scripts on computers, and many of these systems are still popular today. At one point there was an Icelandic code page, and then another Icelandic code page was created just to add the Euro symbol.
This history explains why there are a lot of code pages, but it does not explain how to handle them. A document imported into DBT is interpreted according to its code page, which you can set on the Import File dialog that comes up automatically during the import process. Here are some basic suggestions on getting the correct code page.
DBT Name | Other Names | Language / Region |
---|---|---|
ISO-8859-1 | Latin-1 | Western European |
ISO-8859-2 | Latin-2 | Central European |
ISO-8859-3 | Latin-3 | South European and Esperanto |
ISO-8859-4 | Latin-4 | Baltic, old |
ISO-8859-5 | Cyrillic | Russian and related languages |
ISO-8859-6 | Arabic | Arabic, Farsi, and Urdu |
ISO-8859-7 | Greek | Greece |
ISO-8859-8 | Hebrew | Israel |
ISO-8859-9 | Latin-5 | Turkish |
ISO-8859-10 | Latin-6 | Nordic |
ISO-8859-11 | Thai | Thailand |
ISO-8859-13 | Latin-7 | Baltic, new |
ISO-8859-14 | Latin-8 | Celtic |
ISO-8859-15 | Latin-9 | Revised Western European |
KOI8-R | RFC 1489 | Russian, Bulgarian |
KOI8-U | RFC 2319 | Ukrainian |
WINDOWS-437 | Old MS-DOS character set | European |
WINDOWS-866 | Cyrillic | Russian and related languages |
WINDOWS-874 | Thai | Thailand |
WINDOWS-932 | 31J, MS version of Shift JIS | Japanese |
WINDOWS-936 | MS version of GB1312 | Simplified Chinese |
WINDOWS-949 | MS version of EUC-KR | Korean |
WINDOWS-950 | MS version of Big5 | Chinese |
WINDOWS-1250 | Microsoft Windows Central European | Central Europe |
WINDOWS-1251 | Microsoft Windows Cyrillic | Cyrillic |
WINDOWS-1252 | Microsoft Windows Western European | Western Europe |
WINDOWS-1253 | Microsoft Windows Greek | Greece |
WINDOWS-1254 | Microsoft Windows Turkish | Turkey |
WINDOWS-1255 | Microsoft Windows Hebrew | Israel |
WINDOWS-1256 | Microsoft Windows Arabic | Middle East |
WINDOWS-1257 | Microsoft Windows Baltic | Baltic countries |
WINDOWS-1258 | Microsoft Windows Vietnamese | Vietnam |
UTF-7 | Unicode (all characters) | Every region |
UTF-8 | Unicode (all characters) | Every region |
UTF 16BE | Unicode (all characters) | Every region |
UTF 16LE | Unicode (all characters) | Every region |
UTF 32BE | Unicode (all characters) | Every region |
UTF 32LE | Unicode (all characters) | Every region |
MacRoman | none | Western Europe |
MacJapanese Code Page | Shift JIS | Japanese |
MacChineseTrad Code Page | Big5 | Chinese |
MacKorean Code Page | EUC-KR | Korean |
MacArabic | none | Arabic |
MacArabic-Farsi | none | Arabic and Farsi |
MacHebrew | none | Hebrew |
MacGreek | none | Greek |
MacCyrillic | none | Cyrillic |
MacDevanagari | none | India, Hindi |
MacGurmukhi | none | India |
MacGujarati | none | India |
MacOriya | none | India |
MacBengali | none | India, Bangladesh |
MacTamil | none | India, Sri Lanka |
MacTelegu | none | India |
MacKanada | none | India |
MacMalayalam | none | India |
MacSingalese | none | India, Sri Lanka |
MacKhmer | none | Khmer, Cambodia |
MacThai | none | Thai |
MacLaotian | none | Laos |
MacGeorgian | none | Georgia |
MacArmenian | none | Armenia |
MacChineseSimp | EUC-CN | China |
MacTibetian | none | Tibet |
MacMongolian | none | Mongolia |
MacEthiopic | none | Ethiopia |
Mac-Central-European | none | Central Europe |
MacVietnamese | none | Vietnam |
MacExtArabic | none | Arabic Scripts |
MacSymbol | none | Any region |
MacDingbats | none | Any region |
MacTurkish | none | Turkey |
MacCroatian | none | Croatia |
MacIcelandic | Icelandic | Iceland |
MacRomanian | Romanian | Romania |
MacCeltic | Scottish | Scotland |
MacGaelic | Irish | Ireland |
MacKeyboardGlyphs | Odd Symbols | Every region |
ISO-2022-JP | Japanese | Japan |