MinEd Unicode Howto

Environment setup and Usage of mined for Unicode text



Handling Unicode text with mined

Screen handling
Usually, mined will auto-detect a UTF-8 terminal and also the detailed features it has (like double-width and combining characters, Arabic ligature joining, different width data sets).

Character encoding
By default, mined detects automatically if the text in an edited file is UTF-8 encoded (Unicode character set) or not (either 8 bit encoded or CJK encoded); it also detects and maintains UTF-16.
Mined handles illegal UTF-8 sequences transparently so if you accidentally open an 8 bit or CJK encoded file in UTF-8 mode, or a file with mixed parts, you can edit the text without problems and will not loose any information. Non-UTF-8 codes are indicated by display background highlighting.
While editing, you can switch the character encoding assumed for text interpretation with the encoding menu (left-click to toggle current and previous encoding, right-click to open menu).

Unicode display on non-Unicode terminal
Characters that cannot be displayed in the encoding of the terminal are indicated by some suitable replacement, indicated by coloured background. Indications are chosen as to suggest the text character as best as possible, with special indications for combining characters ', quotation marks ", dashes -, the Euro symbol E etc., and using a base character according to Unicode decomposition for accented and other precomposed characters.
Please consult the manual page, section Unicode display for details.

Combining characters
Mined supports display and editing of combined characters consisting of a base character and one or more combining characters, in one of two modes:
  • Combined display mode: combined characters are displayed as they should appear, navigation within the combined character is possible (Control-cursor-left/right), the character information display (HOP ESC u, or from “?” Info menu) shows which part (base or combining character) of the combined character you are positioned on, Mark/Copy/Paste and Control-Del acts on the respective position.
  • Separated display mode: base character and combining characters are separated for explicit handling.
These modes can be selected and are indicated in the Combining display flag: ç: combined mode, `: separated mode.
See the manual page, section Combining characters for details.

Bidirectional display
Mined auto-detects if it is running in a terminal supporting Arabic (by checking LAM/ALEF ligature joining) and other right-to-left scripts (e.g. mlterm), or it can be told so with the command line parameter +UU.
The mined runtime support library contains a script mterm to invoke the mlterm terminal emulator with suitable parameters to set up bidi mode and a suitable font.

CJK and 8 bit character set support on Unicode terminal
Mined support for major CJK encodings is also best used in a UTF-8 terminal (unless you need specific CJK input features of dedicated terminals); this setup is also well suited for editing text encoded in various 8 bit character sets.
See the mined features page for an overview of CJK support features.
See the manual page, sections Character encoding support and CJK support for details.

Unicode environment setup

Quick and easy
Use the command uterm to invoke a UTF-8 enabled terminal with automatic selection of a suitable font for best coverage of Unicode characters.
  • The uterm script comes with the mined package; it is included in the mined runtime support library and may be installed in the path with the mined application.
  • Note: The uterm script assumes that a UTF-8 enabled version of xterm or rxvt-unicode is already installed on your system, as well as fonts suitable for your needs. If this is not the case on your system, follow the advice below.

On Windows / cygwin: Use the command wined to start mined in a mintty terminal configured to use UTF-8 and Windows look-and-feel.
(When running an X server, uterm works too, as described above.)
The cygwin 1.7 console also runs UTF-8 by default.

On Windows / Explorer: With the mined stand-alone Windows package installed, right-click on a text file to open its context menu, and select “MinEd”.

On Mac OS X: Preferably, use xterm or iTerm 2 as the native Mac terminal application does not support the mouse.

Install suitable terminal
Mined is a text mode editor. Its UTF-8 display and input support is available with terminal emulators supporting UTF-8 and running in UTF-8 mode, like xterm (version >= 145), rxvt-unicode, mlterm, KDE konsole, gnome-terminal, Linux console, cygwin console, mintty, PuTTY.
  • If you don't have a recent version of xterm on your system, compile it yourself; invoke configure --enable-wide-chars or use the script configure-xterm from the mined runtime support library. Then invoke make. You may want to compact the resulting executable with strip xterm; then install it into your path, e.g. in $HOME/bin.
    Note: xterm, like mined, can be used to enable UTF-8 and Unicode support on legacy systems, even if they do not offer any “locale” support, and without needing root privilege.

Install suitable fonts
Install Unicode fonts for your X server.
  • To check if your X installation already provides Unicode fonts, you may invoke the command xlsfonts | grep iso10646. If this doesn't list anything, or if you cannot find a suitable font setup, do one of the following:
  • Automatic installation:
    The Mined runtime support library contains a script installfonts that downloads these fonts and installs them with your X server. It finally gives some hints how to add them to your permanent font configuration.
  • Manual installation:
    • Retrieve some of the following fonts:
      1. UCS fonts for X with their CJK supplement from Markus Kuhn's page Unicode fonts and tools for X11
      2. Adobe and B&H bitmap fonts from the same site which contain fixed width Courier and Lucida Typewriter fonts
      3. Unicode VGA font from Dmitry Bolkhovityanov's site
      4. Monospace Roman BDF fonts and their Oblique / Bold / Bold Oblique supplements from George Williams Unicode fonts page
    • The nicest looking font in the UCS fonts archive mentioned above is the 10x20 size font, it is suitable for higher screen resolutions. Unfortunately, the CJK double-width fonts are not distributed in the corresponding 20x20 size, but only in the 18x18 size. The corresponding single-width font in 9x18 size, however, looks quite spindly and for my taste rather awkward.
      For this reason, I am providing a script to generate 20x20 CJK fonts automatically from the 18x18 UCS fonts distributed for X servers. It is bdf18to20 and you find it in the mined runtime support library. Go into the directory where you unpacked the fonts and invoke the script.
    • Install the fonts with your X server: unpack them into a directory (e.g. $HOME/xfonts), go into that directory, invoke the mkfontdir command. Then make sure that the fonts are loaded into your X server, using the command xset +fp $HOME/xfonts; a suitable place to include this automatically would be your $HOME/.xinitrc X initialisation file if you have one.
      Note: If you are working in a network, make sure the xset command is invoked such that the X server has access to the given directory on the machine it is running on.
      Some X servers (e.g. Exceed on Windows) do not accept BDF fonts; use the “Compile Fonts” function of the configuration menu to install the fonts.

Start terminal in UTF-8 mode
Invoke a terminal window in UTF-8 mode and configure it to use fonts sufficient to display the text you want to edit.
  • Invoke xterm with suitable resource configuration or command line parameters.
    • I recommend to invoke xterm with the script uterm from the mined runtime support library.
    • Alternatively, invoke xterm -u8 or xterm -en UTF-8 to enforce UTF-8 mode, depending on system configuration; also the option +lc may be needed in addition.

    Mined detects UTF-8 terminal mode automatically (exception: cygwin 1.7 UTF-8 console after rlogin or telnet). So it will work even if your locale environment is not configured properly.
    Note: xterm is quite touchy about configuring suitable matching fonts for single-width and double-width glyphs. If you are unlucky, CJK character display will result in garbage on the screen. My recommendation is to generate the 20x20 UCS fonts with my bdf18to20 script as mentioned above and configure xterm to use 10x20 – it will then automatically select one of the 20x20 fonts for double-width characters; if you have a preference among them, use the -fw command line option or the wideFont X resource (in your $HOME/.Xdefaults file). See the pattern file Xdefaults.mined in the mined runtime support library for suggestions of suitable entries. (Double-width font matching works much better with rxvt which even seems to scale double-width fonts in an acceptable way if needed.)
  • If you prefer rxvt, use rxvt-unicode and make sure to indicate using UTF-8 by setting a locale in your environment that is installed on your system, for example LC_ALL=en_US.UTF-8 urxvt on cygwin.
    Note: rxvt is quite touchy about configuring a known locale setting; it does not have a strict UTF-8 option that would reliably work on all systems.
  • Note: For hints how to configure the environment explicitly so that rxvt, konsole and other applications work with UTF-8, see the mined manual page (about LC_CTYPE and other environment variables). Accurate locale setting is not needed by xterm and mined.
    For other terminals (e.g. mlterm), see their manual for how to configure UTF-8 mode.
  • Alternatively, you can start mined directly together with its own terminal window. For this purpose, the mined runtime support library contains the script umined. If starting xterm, this script also configures xterm on-the-fly to apply the most recent version of Unicode width data as built-in to xterm in contrast to system-provided locale data (which may refer to an older version of Unicode) for handling of wide and combining characters.
  • On a Windows system, you can also use the script wined or wined.bat which will invoke mined in a mintty terminal window, configured to use UTF-8 and Windows look-and-feel.

Mined homepage and download.
Thomas Wolff