Re: OT: Unicode 5.0
From: | Paul Bennett <paul-bennett@...> |
Date: | Tuesday, January 10, 2006, 0:17 |
On Mon, 09 Jan 2006 18:42:41 -0500, Jonathyn Bet'nct <jonrelay@...>
wrote:
> Unicode stresses the distinctions between script, language (many of
> which may use the same script), and glyph variants (which are left to
> the realm of fonts, not text encodings).
See also the Variation Selectors, which tell a different story, and the
Rubric brackets proposed for Egyptian.
> Unicode certainly has fudged a bunch of stuff up initially, and
> unfortunately they can't fix it now.
They *could* fix it, by the same act of administrative fiat that created
Unicode in the first place: make up a new standard with a new name. If
it's superior enough, it will become prefered (if I hear one person so
much as mutter the word "qwerty" from the peanut gallery, I shall smite
thee, for that is an utter fabrication).
My own suggestions?
Purge all characters that are transparently a base character plus one or
more combining diacritics, obviously allowing fonts to store precomposed
versions of any combination the font author desires, just not at
codepoints within the defined standard -- some of this goes on already,
but it ought to be the rule rather than the exception.
Likewise, use ZWJ, ZWNJ and Variation Selectors to encode ligatures,
digraphs, and presentation forms, and encode the composed forms outside
the standard.
Having purged the needless characters, order all remaining glyphs by
script name (alphabetically), and by glyph name (alphabetically) within
each script, including combining characters and spacing modifier letters
(which should have a less silly name). Leave at least one full row (plus a
fractional row to bring the total range to a full number of rows) at the
end of each script, just in case.
Replace the U+FFFE / U+FEFF byteorder/start mark with a mark that encodes
the version number of the standard being adhered to, to allow for future
bugfixes.
Paul
Replies