Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: OT: TECH: Dumb Unicode question

From:Mark J. Reed <markjreed@...>
Date:Friday, November 21, 2003, 18:00
On Fri, Nov 21, 2003 at 10:39:24AM -0500, Mark J. Reed wrote:
> A dumb question into whose answer perhaps Mr. Cowan or someone else > has some insight - how did Unicode end up with such an odd number of > code points?
To attempt at answering my own question - it occurs to me that it is probably because of the surrogate encoding scheme; the number of surrogate pairs was chosen to give a nice round number of surrogate-encodable code points (1,048,576/0x100000=16 planes) - or more accurately, to make the surrogates themselves take up a whole number of code blocks - and when that nice round number is added to the non-surrogate-encoded BMP, you get the odd total. And the 1,114,112 number of possible Unicode code points includes the 2,048 surrogates, which are unavailable for use as characters even in a completely unencoded UCS-4/UTF-32 environment. -Mark