Re: OT: TECH: Dumb Unicode question
From: | Mark J. Reed <markjreed@...> |
Date: | Friday, November 21, 2003, 18:00 |
On Fri, Nov 21, 2003 at 10:39:24AM -0500, Mark J. Reed wrote:
> A dumb question into whose answer perhaps Mr. Cowan or someone else
> has some insight - how did Unicode end up with such an odd number of
> code points?
To attempt at answering my own question - it occurs to me that
it is probably because of the surrogate encoding scheme; the
number of surrogate pairs was chosen to give a nice round number
of surrogate-encodable code points (1,048,576/0x100000=16 planes) -
or more accurately, to make the surrogates themselves take up a whole
number of code blocks - and when that nice round number is added to the
non-surrogate-encoded BMP, you get the odd total.
And the 1,114,112 number of possible Unicode code points
includes the 2,048 surrogates, which are unavailable for use as
characters even in a completely unencoded UCS-4/UTF-32 environment.
-Mark