Re: Conlang Unicode Font (was Re: Kamakawi Unicode Font Question)
From: | Tristan McLeay <conlang@...> |
Date: | Saturday, March 8, 2008, 7:09 |
On 08/03/08 17:40:23, David J. Peterson wrote:
>
> Eric:
> <<
> How is it that such a font can produce ligatures when e.g. <f> is
> followed by <i> or <fl>, but not vary the look of <p> and <b>
> depending on what follows?
> >>
>
> There are separate characters (with unicode points) for the "fi"
> and "fl" ligatures. I guess this is what I'm missing. Certainly
> you can have a program that automatically does something
> like "if user types f + l, replace resultant string with unicode
> character FB02", but the ligature itself still has to have a point.
No, that's wrong. A document should *never* contain the Unicode
character U+FB02. If it did, it would make searching a more difficult
task than it needs to be (if you want to search for the word "waffle",
you'd have to try with f-f-l, f-fl, ff-l, and ffl). Well, in this
particular case Unicode demands that they're equivalent, as long as
your search engine knows that...
A font format contains its own table of characters. Some of them are
declared to have a particular Unicode code point, others don't. Then
you declare a set of rules in the font to say "in this context, don't
use the default character, use this character instead". The way the
font format identifiers the characters is completely separate from the
unicode encoding: Unicode doesn't care ---at all--- what characters
look like, in isolation or in context. You can search all day in your
character map for the variant (i.e. Icelandic) forms of "þ" and
"ð" (thorn and eth) in Junicode (which is for Old English, and so has a
different style). You'll never find them. Instead, you have to switch a
toggle in the font dialog if you want it to look good for Icelandic or
IPA, even though it's included in the same file. (If you looked it up
in FontForge or another font editor, you'd find them sitting at the
very end of the list of characters, without having a Unicode character
code above them.)
> Tristan:
> <<
> Oh yes certainly you can! In fact, the very existence of initial,
> medial, final and stand-alone Arabic characters in Unicode is
> considered to be for historical reasons only, and you should never
> want
> to use them.
> >>
>
> I mean, I believe you, and the message came out correctly, but
> are you suggesting, then, that Unicode doesn't need to have a
> codepoint *anywhere* for *any* non-stand alone character of
> Arabic?
Yes.
> If so, how does it avoid the above-mentioned problem?
Your explanation of the problem is wrong, as I hopefully explained
above. The few Latin ligatures, all those Korean codepoints, combined
accented Vietnamese letters, the Arabic characters, they're all there
for backwards compatibility or for political reasons. Somewhere,
there's a character set that included it because the font technology
wasn't able to automatically combine forms. Unicode just inherited a
lot of characters from that age it would rather just ignore.
--
Tristan.
Reply