Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Saprutum Script

From:<kam@...>
Date:Friday, May 11, 2001, 23:55
On 10 May 2001  John Cowan <cowan@...> wrote :
Re: Saprutum Script

> In general, ligatures should not be encoded separately UNLESS both > of the following apply: > > both the ligatured and unligatured forms are fairly common; > > there is a semantic distinction between ligatured and > unligatured forms. >
This would then rule out all but the first 27 forms (page 1), except perhaps for the inflexions, for which I might do some special pleading as they are very common, have a degree of semantic (in)distinction and should probably be ignored when indexing etc. There are however 21 of them, which is rather a lot!
> > If either the ligatured form, or the unligatured form, is fairly rare, > then the Unicode character ZWJ can be inserted to create a > ligature, or ZWNJ to prevent one. >
The <e> in the "Romanised" transliteration would convert to ZWNJ (the presence of [e] is predictable so it's not shown in the native script) so that for instance "tetlam" (rendered TTL{-AM}) doesn't have a t-t ligature, but "weqatti" (WQA{TT}I) does. When keying in text it's sometimes necessary to insert a ZWNJ-<e> to prevent the end of a root in -y -w -? -t or -r being taken as part of the inflexion. E.g. the genitive of "kabir-" is "kabirim" which might be rendered with the -irim inflexional glyph as though it were the gen. dual of "*kab-". This is avoided by typing in "kabierim" or "kabireim" etc.
> > Finally, if the rules for ligaturing are fully automatic, there is no > need to represent the ligature; it can be left to smart rendering > software. This is the case of Indic ligaturing. >
That's probably the closest analogy to the way I treat vowels -- not exactly diacritics like Hebrew or Arabic points, but still less prominent than the consonants.
> > Unicode itself does not always follow these rules, due to the need for > backward compatibility with existing character sets such as MacRoman. >
Does this mean that we're basically trying to code phonemes? That each font (where ligatures are involved) requires it's own set of mapping rules -- Sanskrit must be a nightmare. I came across a UNICODE slot for the Phoenician Alphabet which is just a a "font" of the 22 consonant NW Semitic (ie Hebrew) script, even though the letter forms look very different. This led me to think that sets of glyphs were being coded rather than some more abstract entity. Anyway, I won't argue the point so long as I can get my inflexions in :-) Keith

Reply

John Cowan <jcowan@...>