Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: OT: Unicode 5.0

From:Tim May <butsuri@...>
Date:Tuesday, January 10, 2006, 0:26
Jonathyn Bet'nct wrote at 2006-01-09 15:42:41 (-0800)
 > On 1/9/06, John Vertical <johnvertical@...> wrote:
 > > ...At risk of threadjack accusations, I'll use the opening to
 > > also fire a question that's been bothering me for a while - Why
 > > does Unicode include several characters multiple times? There are
 > > 6561 different ways to write "THAI POEM". If capital alpha is
 > > different from capital ay just because it's used in a different
 > > alphabet to write a different language, isn't (eg) Icelandic "A"
 > > also a different character then? Are they really purposely
 > > randomly tagging unnecessary etymological/usage information to
 > > symbols, or is it that they just fudged it up initially (for
 > > whatever political reasons) and can't fix it at this stage any
 > > more?
 >
 > This is because Icelandic uses the same /script/ as English. Greek
 > uses a different /script/, therefore capital alpha gets its own
 > encoding, while Icelandic ay is encoded as the same as English ay.

Furthermore they have different lower-case forms, which can cause
similar situations even within scripts.  Witness U+00D0 LATIN CAPITAL
LETTER ETH vs. U+0110 LATIN CAPITAL LETTER D WITH STROKE vs.
U+0189 LATIN CAPITAL LETTER AFRICAN D.

 > Unicode stresses the distinctions between script, language (many of
 > which may use the same script), and glyph variants (which are left to
 > the realm of fonts, not text encodings).
 >
 > Unicode certainly has fudged a bunch of stuff up initially, and
 > unfortunately they can't fix it now. (One thing in particular, I
 > think they should have encoded small caps a long time ago. One of
 > the proposals that was linked to included a small-cap F and S, and
 > mentioned that the only other small caps left unencoded were Q and
 > X.  Interesting, I thought, so I went on a hunt for all the small
 > caps (other than F, Q, S, and X). I could only find a handful of
 > them, and they're randomly dotted all over the place: Latin
 > Extended A, IPA Extensions, Letterlike Symbols, etc. But anyway,
 > enough of my rant.)
 >

U+0262 LATIN LETTER SMALL CAPITAL G
U+026A LATIN LETTER SMALL CAPITAL I
U+0274 LATIN LETTER SMALL CAPITAL N
U+0280 LATIN LETTER SMALL CAPITAL R
U+028F LATIN LETTER SMALL CAPITAL Y
U+0299 LATIN LETTER SMALL CAPITAL B
U+029C LATIN LETTER SMALL CAPITAL H
U+029F LATIN LETTER SMALL CAPITAL L
U+1D00 LATIN LETTER SMALL CAPITAL A
U+1D04 LATIN LETTER SMALL CAPITAL C
U+1D05 LATIN LETTER SMALL CAPITAL D
U+1D07 LATIN LETTER SMALL CAPITAL E
U+1D0A LATIN LETTER SMALL CAPITAL J
U+1D0B LATIN LETTER SMALL CAPITAL K
U+1D0D LATIN LETTER SMALL CAPITAL M
U+1D0F LATIN LETTER SMALL CAPITAL O
U+1D18 LATIN LETTER SMALL CAPITAL P
U+1D1B LATIN LETTER SMALL CAPITAL T
U+1D1C LATIN LETTER SMALL CAPITAL U
U+1D20 LATIN LETTER SMALL CAPITAL V
U+1D21 LATIN LETTER SMALL CAPITAL W
U+1D22 LATIN LETTER SMALL CAPITAL Z

(None of which are, actually, in Latin Extended A (you may be thinking
of U+0138 LATIN SMALL LETTER KRA) or Letterlike Symbols (which don't
count as letters).  But I can certainly agree that it would have been
more convenient to have encoded them all together at the beginning)

Reply

Jonathyn Bet'nct <jonrelay@...>