Character Set

redet reads and writes UTF-8 Unicode. Whether regular expression matching works with characters outside the 7-bit ASCII range in the test data or the regular expression depends on whether the program that redet calls works with Unicode. Whether characters are properly displayed in redet windows depends on the fonts that you have installed.

Note that some programs that do handle Unicode only work with Unicode in certain locale settings, while others work with Unicode regardless of the locale. Members of the latter category include Python and Pike. Programs that support Unicode only in certain locales include GNU ed, GNU grep, GNU sed and mawk. If you want to test this, try zh_TW.UTF-8 (Taiwan Chinese in UTF-8 encoding) or es_ES.UTF-8 (Castillian in Spain with UTF-8 encoding) for a locale in which Unicode should be supported and es_ES (Castillian in Spain, with default ISO-8859-1 encoding) for a locale in which Unicode is not supported.

Perl can be made to handle Unicode in a variety of ways determined by the setting of an environment variable or command-line flag. Redet runs Perl in such a way as to use UTF-8 Unicode for all input and output, regardless of locale.

Non-ASCII characters can be entered using whatever entry methods the user's system provides or using a Unicode character map such as gucharmap. Widgets are provided for entering characters from the International Phonetic Alphabet since these are scattered through several Unicode ranges and are therefore inconvenient to enter using a general purpose Unicode character map. A widget is also provided for entering characters by their Unicode codepoint. Finally, it is possible to create custom character entry widgets by loading definitions from a file.

As an aid to those working with Unicode, lists of Unicode ranges and general character properties are available from the Help menu.


Next

Back to Table of Contents