Re: OT: Unicode 5.0
From: | Andreas Johansson <andjo@...> |
Date: | Tuesday, January 10, 2006, 0:13 |
Quoting Jonathyn Bet'nct <jonrelay@...>:
> On 1/9/06, John Vertical <johnvertical@...> wrote:
> > ...At risk of threadjack accusations, I'll use the opening to also fire a
> > question that's been bothering me for a while - Why does Unicode include
> > several characters multiple times? There are 6561 different ways to write
> > "THAI POEM". If capital alpha is different from capital ay just because
> it's
> > used in a different alphabet to write a different language, isn't (eg)
> > Icelandic "A" also a different character then? Are they really purposely
> > randomly tagging unnecessary etymological/usage information to symbols, or
> > is it that they just fudged it up initially (for whatever political
> reasons)
> > and can't fix it at this stage any more?
>
> This is because Icelandic uses the same /script/ as English. Greek
> uses a different /script/, therefore capital alpha gets its own
> encoding, while Icelandic ay is encoded as the same as English ay.
> Unicode stresses the distinctions between script, language (many of
> which may use the same script), and glyph variants (which are left to
> the realm of fonts, not text encodings).
Icelandic is sometimes considered a separate script from Latin, presumably since
it includes the Runic-derived thorn. Now, I think the Unicoders took the right
decision not to treat it as separate, but the distinction between variants of
the same script and different scripts is not necessarily unambiguous.
Andreas