Re: Tech: Unicode (was...)
From: | John Cowan <cowan@...> |
Date: | Saturday, May 8, 2004, 18:04 |
Philippe Caquant scripsit:
> Oh. Let's try to summarize.
All correct.
> - it seems that inside the ranges we mentioned before,
> some sub-ranges are never used (D8.00 to DB.FF = 1,024
> codes in every range of 65,536 codes = 1.5625%)
Only in Plane 0, not the other planes. (1 plane = 65536 codes)
> - others will do what they like, compressing single
> bits, dividing them in two, or painting them different
> colors to differentiate them, we don't care, provided
> we get our information back correctly.
Nobody ever uses anything except UTF-8, UTF-16, and maybe
UTF-32 inside programs. For serialization into octets,
UTF-16 has a problem: which octet first? To solve this, we can
prefix FEFF to the text, which works because FFFE is reserved;
if we read FFFE, we need to swap octets thereafter.
--
What is the sound of Perl? Is it not the John Cowan
sound of a [Ww]all that people have stopped jcowan@reutershealth.com
banging their head against? --Larry http://www.ccil.org/~cowan