Re: Saprutum Script
From: | John Cowan <cowan@...> |
Date: | Friday, May 11, 2001, 3:31 |
kam@CARROT.CLARA.NET scripsit:
> I'd be interested in staking out a bit of Unicode territory for these
> characters, but just what is and isn't a character??
>
> e.g. the combination "LI" (see p.3) could be treated as a "ligature" to
> be coded separately, or as "L" plus a diacritic "I", or as two
> separate characters with very close kerning. I assume there are
> guidlines for this sort of thing.
In general, ligatures should not be encoded separately UNLESS both
of the following apply:
both the ligatured and unligatured forms are fairly common;
there is a semantic distinction between ligatured and
unligatured forms.
If either the ligatured form, or the unligatured form, is fairly rare,
then the Unicode character ZWJ can be inserted to create a
ligature, or ZWNJ to prevent one.
If the difference between the ligatured and unligatured forms is
purely a matter of typographical style, and does not affect mere
legibility, then the distinction can and should be left to markup
rather than Unicode, which is a plain-text standard. An example of this is
the oe-ligature in French, which can always be written as
plain "oe" instead.
Finally, if the rules for ligaturing are fully automatic, there is no
need to represent the ligature; it can be left to smart rendering
software. This is the case of Indic ligaturing.
Unicode itself does not always follow these rules, due to the need for
backward compatibility with existing character sets such as MacRoman.
--
John Cowan cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter
Reply