Re: Saprutum Script
From: | <kam@...> |
Date: | Friday, May 11, 2001, 23:55 |
On 10 May 2001 John Cowan <cowan@...> wrote :
Re: Saprutum Script
> In general, ligatures should not be encoded separately UNLESS both
> of the following apply:
>
> both the ligatured and unligatured forms are fairly common;
>
> there is a semantic distinction between ligatured and
> unligatured forms.
>
This would then rule out all but the first 27 forms (page 1), except
perhaps for the inflexions, for which I might do some special pleading
as they are very common, have a degree of semantic (in)distinction and
should probably be ignored when indexing etc. There are however 21 of
them, which is rather a lot!
>
> If either the ligatured form, or the unligatured form, is fairly rare,
> then the Unicode character ZWJ can be inserted to create a
> ligature, or ZWNJ to prevent one.
>
The <e> in the "Romanised" transliteration would convert to ZWNJ (the
presence of [e] is predictable so it's not shown in the native script)
so that for instance "tetlam" (rendered TTL{-AM}) doesn't have a t-t
ligature, but "weqatti" (WQA{TT}I) does. When keying in text it's
sometimes necessary to insert a ZWNJ-<e> to prevent the end of a root
in -y -w -? -t or -r being taken as part of the inflexion. E.g. the
genitive of "kabir-" is "kabirim" which might be rendered with the
-irim inflexional glyph as though it were the gen. dual of "*kab-". This
is avoided by typing in "kabierim" or "kabireim" etc.
>
> Finally, if the rules for ligaturing are fully automatic, there is no
> need to represent the ligature; it can be left to smart rendering
> software. This is the case of Indic ligaturing.
>
That's probably the closest analogy to the way I treat vowels -- not
exactly diacritics like Hebrew or Arabic points, but still less prominent
than the consonants.
>
> Unicode itself does not always follow these rules, due to the need for
> backward compatibility with existing character sets such as MacRoman.
>
Does this mean that we're basically trying to code phonemes? That each
font (where ligatures are involved) requires it's own set of mapping
rules -- Sanskrit must be a nightmare.
I came across a UNICODE slot for the Phoenician Alphabet which is just a
a "font" of the 22 consonant NW Semitic (ie Hebrew) script, even though
the letter forms look very different. This led me to think that sets of
glyphs were being coded rather than some more abstract entity.
Anyway, I won't argue the point so long as I can get my inflexions in :-)
Keith
Reply