Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Tech: Unicode (was...)

From:John Cowan <cowan@...>
Date:Saturday, May 8, 2004, 18:04
Philippe Caquant scripsit:

> Oh. Let's try to summarize.
All correct.
> - it seems that inside the ranges we mentioned before, > some sub-ranges are never used (D8.00 to DB.FF = 1,024 > codes in every range of 65,536 codes = 1.5625%)
Only in Plane 0, not the other planes. (1 plane = 65536 codes)
> - others will do what they like, compressing single > bits, dividing them in two, or painting them different > colors to differentiate them, we don't care, provided > we get our information back correctly.
Nobody ever uses anything except UTF-8, UTF-16, and maybe UTF-32 inside programs. For serialization into octets, UTF-16 has a problem: which octet first? To solve this, we can prefix FEFF to the text, which works because FFFE is reserved; if we read FFFE, we need to swap octets thereafter. -- What is the sound of Perl? Is it not the John Cowan sound of a [Ww]all that people have stopped jcowan@reutershealth.com banging their head against? --Larry http://www.ccil.org/~cowan