Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Conlang Unicode Font (was Re: Kamakawi Unicode Font Question)

From:Tristan McLeay <conlang@...>
Date:Saturday, March 8, 2008, 7:09
On 08/03/08 17:40:23, David J. Peterson wrote:
> > Eric: > << > How is it that such a font can produce ligatures when e.g. <f> is > followed by <i> or <fl>, but not vary the look of <p> and <b> > depending on what follows? > >> > > There are separate characters (with unicode points) for the "fi" > and "fl" ligatures. I guess this is what I'm missing. Certainly > you can have a program that automatically does something > like "if user types f + l, replace resultant string with unicode > character FB02", but the ligature itself still has to have a point.
No, that's wrong. A document should *never* contain the Unicode character U+FB02. If it did, it would make searching a more difficult task than it needs to be (if you want to search for the word "waffle", you'd have to try with f-f-l, f-fl, ff-l, and ffl). Well, in this particular case Unicode demands that they're equivalent, as long as your search engine knows that... A font format contains its own table of characters. Some of them are declared to have a particular Unicode code point, others don't. Then you declare a set of rules in the font to say "in this context, don't use the default character, use this character instead". The way the font format identifiers the characters is completely separate from the unicode encoding: Unicode doesn't care ---at all--- what characters look like, in isolation or in context. You can search all day in your character map for the variant (i.e. Icelandic) forms of "þ" and "ð" (thorn and eth) in Junicode (which is for Old English, and so has a different style). You'll never find them. Instead, you have to switch a toggle in the font dialog if you want it to look good for Icelandic or IPA, even though it's included in the same file. (If you looked it up in FontForge or another font editor, you'd find them sitting at the very end of the list of characters, without having a Unicode character code above them.)
> Tristan: > << > Oh yes certainly you can! In fact, the very existence of initial, > medial, final and stand-alone Arabic characters in Unicode is > considered to be for historical reasons only, and you should never > want > to use them. > >> > > I mean, I believe you, and the message came out correctly, but > are you suggesting, then, that Unicode doesn't need to have a > codepoint *anywhere* for *any* non-stand alone character of > Arabic?
Yes.
> If so, how does it avoid the above-mentioned problem?
Your explanation of the problem is wrong, as I hopefully explained above. The few Latin ligatures, all those Korean codepoints, combined accented Vietnamese letters, the Arabic characters, they're all there for backwards compatibility or for political reasons. Somewhere, there's a character set that included it because the font technology wasn't able to automatically combine forms. Unicode just inherited a lot of characters from that age it would rather just ignore. -- Tristan.

Reply

David J. Peterson <dedalvs@...>