Re: OT: Unicode 5.0
From: | Tim May <butsuri@...> |
Date: | Tuesday, January 10, 2006, 0:26 |
Jonathyn Bet'nct wrote at 2006-01-09 15:42:41 (-0800)
> On 1/9/06, John Vertical <johnvertical@...> wrote:
> > ...At risk of threadjack accusations, I'll use the opening to
> > also fire a question that's been bothering me for a while - Why
> > does Unicode include several characters multiple times? There are
> > 6561 different ways to write "THAI POEM". If capital alpha is
> > different from capital ay just because it's used in a different
> > alphabet to write a different language, isn't (eg) Icelandic "A"
> > also a different character then? Are they really purposely
> > randomly tagging unnecessary etymological/usage information to
> > symbols, or is it that they just fudged it up initially (for
> > whatever political reasons) and can't fix it at this stage any
> > more?
>
> This is because Icelandic uses the same /script/ as English. Greek
> uses a different /script/, therefore capital alpha gets its own
> encoding, while Icelandic ay is encoded as the same as English ay.
Furthermore they have different lower-case forms, which can cause
similar situations even within scripts. Witness U+00D0 LATIN CAPITAL
LETTER ETH vs. U+0110 LATIN CAPITAL LETTER D WITH STROKE vs.
U+0189 LATIN CAPITAL LETTER AFRICAN D.
> Unicode stresses the distinctions between script, language (many of
> which may use the same script), and glyph variants (which are left to
> the realm of fonts, not text encodings).
>
> Unicode certainly has fudged a bunch of stuff up initially, and
> unfortunately they can't fix it now. (One thing in particular, I
> think they should have encoded small caps a long time ago. One of
> the proposals that was linked to included a small-cap F and S, and
> mentioned that the only other small caps left unencoded were Q and
> X. Interesting, I thought, so I went on a hunt for all the small
> caps (other than F, Q, S, and X). I could only find a handful of
> them, and they're randomly dotted all over the place: Latin
> Extended A, IPA Extensions, Letterlike Symbols, etc. But anyway,
> enough of my rant.)
>
U+0262 LATIN LETTER SMALL CAPITAL G
U+026A LATIN LETTER SMALL CAPITAL I
U+0274 LATIN LETTER SMALL CAPITAL N
U+0280 LATIN LETTER SMALL CAPITAL R
U+028F LATIN LETTER SMALL CAPITAL Y
U+0299 LATIN LETTER SMALL CAPITAL B
U+029C LATIN LETTER SMALL CAPITAL H
U+029F LATIN LETTER SMALL CAPITAL L
U+1D00 LATIN LETTER SMALL CAPITAL A
U+1D04 LATIN LETTER SMALL CAPITAL C
U+1D05 LATIN LETTER SMALL CAPITAL D
U+1D07 LATIN LETTER SMALL CAPITAL E
U+1D0A LATIN LETTER SMALL CAPITAL J
U+1D0B LATIN LETTER SMALL CAPITAL K
U+1D0D LATIN LETTER SMALL CAPITAL M
U+1D0F LATIN LETTER SMALL CAPITAL O
U+1D18 LATIN LETTER SMALL CAPITAL P
U+1D1B LATIN LETTER SMALL CAPITAL T
U+1D1C LATIN LETTER SMALL CAPITAL U
U+1D20 LATIN LETTER SMALL CAPITAL V
U+1D21 LATIN LETTER SMALL CAPITAL W
U+1D22 LATIN LETTER SMALL CAPITAL Z
(None of which are, actually, in Latin Extended A (you may be thinking
of U+0138 LATIN SMALL LETTER KRA) or Letterlike Symbols (which don't
count as letters). But I can certainly agree that it would have been
more convenient to have encoded them all together at the beginning)
Reply