Code Pages in DBT

A Code Page is a widely-recognized system of matching characters and other symbols with the numbers that represent them in the computer. Modern computer systems now have a code page (called Unicode) that includes virtually all characters in use. But before Unicode, the set of characters in a code page was limited to 256 characters. The solution then was to allow multiple Code Pages, each with a different group of characters.

The history and proliferation of code pages is a mirror of the development of computers. One of the first code pages, EBCDIC, was invented by IBM. This code page is now out of use. Then ASCII was a standard, but only for the first 128 characters. Many code pages use the same 128 characters at the beginning (called the low bit characters), but then vary extensively on the next 128 characters (called the high bit characters).

After ASCII was introduced, IBM created a new set of code pages. Originally Microsoft co-operated with IBM, and they used identical code pages. In the 1990's, these two companies stopped co-operating, and Microsoft made its own version of key code pages. Before Unicode, various Asian nations also set up their own systems for handling their scripts on computers, and many of these systems are still popular today. At one point there was an Icelandic code page, and then another Icelandic code page was created just to add the Euro symbol.

This history explains why there are a lot of code pages, but it does not explain how to handle them. A document imported into DBT is interpreted according to its code page, which you can set on the Import File dialog that comes up automatically during the import process. Here are some basic suggestions on getting the correct code page.

List of Code Pages (Character Sets)
DBT Name Other Names Language / Region
ISO-8859-1 Latin-1 Western European
ISO-8859-2 Latin-2 Central European
ISO-8859-3 Latin-3 South European and Esperanto
ISO-8859-4 Latin-4 Baltic, old
ISO-8859-5 Cyrillic Russian and related languages
ISO-8859-6 Arabic Arabic, Farsi, and Urdu
ISO-8859-7 Greek Greece
ISO-8859-8 Hebrew Israel
ISO-8859-9 Latin-5 Turkish
ISO-8859-10 Latin-6 Nordic
ISO-8859-11 Thai Thailand
ISO-8859-13 Latin-7 Baltic, new
ISO-8859-14 Latin-8 Celtic
ISO-8859-15 Latin-9 Revised Western European
KOI8-R RFC 1489 Russian, Bulgarian
KOI8-U RFC 2319 Ukrainian
WINDOWS-437 Old MS-DOS character set European
WINDOWS-866 Cyrillic Russian and related languages
WINDOWS-874 Thai Thailand
WINDOWS-932 31J, MS version of Shift JIS Japanese
WINDOWS-936 MS version of GB1312 Simplified Chinese
WINDOWS-949 MS version of EUC-KR Korean
WINDOWS-950 MS version of Big5 Chinese
WINDOWS-1250 Microsoft Windows Central European Central Europe
WINDOWS-1251 Microsoft Windows Cyrillic Cyrillic
WINDOWS-1252 Microsoft Windows Western European Western Europe
WINDOWS-1253 Microsoft Windows Greek Greece
WINDOWS-1254 Microsoft Windows Turkish Turkey
WINDOWS-1255 Microsoft Windows Hebrew Israel
WINDOWS-1256 Microsoft Windows Arabic Middle East
WINDOWS-1257 Microsoft Windows Baltic Baltic countries
WINDOWS-1258 Microsoft Windows Vietnamese Vietnam
UTF-7 Unicode (all characters) Every region
UTF-8 Unicode (all characters) Every region
UTF 16BE Unicode (all characters) Every region
UTF 16LE Unicode (all characters) Every region
UTF 32BE Unicode (all characters) Every region
UTF 32LE Unicode (all characters) Every region
MacRoman none Western Europe
MacJapanese Code Page Shift JIS Japanese
MacChineseTrad Code Page Big5 Chinese
MacKorean Code Page EUC-KR Korean
MacArabic none Arabic
MacArabic-Farsi none Arabic and Farsi
MacHebrew none Hebrew
MacGreek none Greek
MacCyrillic none Cyrillic
MacDevanagari none India, Hindi
MacGurmukhi none India
MacGujarati none India
MacOriya none India
MacBengali none India, Bangladesh
MacTamil none India, Sri Lanka
MacTelegu none India
MacKanada none India
MacMalayalam none India
MacSingalese none India, Sri Lanka
MacKhmer none Khmer, Cambodia
MacThai none Thai
MacLaotian none Laos
MacGeorgian none Georgia
MacArmenian none Armenia
MacChineseSimp EUC-CN China
MacTibetian none Tibet
MacMongolian none Mongolia
MacEthiopic none Ethiopia
Mac-Central-European none Central Europe
MacVietnamese none Vietnam
MacExtArabic none Arabic Scripts
MacSymbol none Any region
MacDingbats none Any region
MacTurkish none Turkey
MacCroatian none Croatia
MacIcelandic Icelandic Iceland
MacRomanian Romanian Romania
MacCeltic Scottish Scotland
MacGaelic Irish Ireland
MacKeyboardGlyphs Odd Symbols Every region
ISO-2022-JP Japanese Japan