Re: OT: TECH: Dumb Unicode question

From:	Mark J. Reed <markjreed@...>
Date:	Friday, November 21, 2003, 18:00

|< < Post > >| << List/Tree >> Reference November 2003 Index

On Fri, Nov 21, 2003 at 10:39:24AM -0500, Mark J. Reed wrote:
> A dumb question into whose answer perhaps Mr. Cowan or someone else
> has some insight - how did Unicode end up with such an odd number of
> code points?
To attempt at answering my own question - it occurs to me that
it is probably because of the surrogate encoding scheme; the
number of surrogate pairs was chosen to give a nice round number
of surrogate-encodable code points (1,048,576/0x100000=16 planes) -
or more accurately, to make the surrogates themselves take up a whole
number of code blocks - and when that nice round number is added to the
non-surrogate-encoded BMP, you get the odd total.

And the 1,114,112 number of possible Unicode code points
includes the 2,048 surrogates, which are unavailable for use as
characters even in a completely unencoded UCS-4/UTF-32 environment.

-Mark

|< < Post > >| << List/Tree >> Reference November 2003 Index