Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unicode vs The Rest Of The World (Again) (was Re: Re: Le tilde a-t-il été utilisé en français?)

From:Garth Wallace <gwalla@...>
Date:Saturday, May 1, 2004, 0:38
Paul Bennett wrote:
> On Fri, 30 Apr 2004 14:35:00 -0700, Garth Wallace <gwalla@...> > wrote: > >> Mark J. Reed wrote: >> >>> Which would be bad enough if it were just the more typical "8-bit >>> characters get munged; you must use 7-bit encoding methods" problem. >>> But that's not the case. The mail server understands the various MIME >>> 8-to-7-bit encoding techniques, reverses them, and *then* does >>> the replacement anyway just as if the message arrived in 8-bit mode. >> >> >> Are you sure about that? I've gotte Unicode messages from the list >> without problems before. I think it may be some people's clients doing >> weird conversions. > > > It's only a small subset of Unicode that gets mangled, rather than every > character (we've seen it on the Georgian alphabet, notably), at least with > UTF-8. UTF-8 is not merely raw Unicode, but rather a set of multi-byte > codes, only some of which lie within the deadly 128-150 range.
Ah, so it's only the Unicode characters that contain bytes matching ASCII control characters with the 8th bit set that get mangled. Okay.
> Should anyone post in pure UTF-16, I imagine the problem might manifest > itself more often, especially if they use the right (or wrong?) Unicode > pages.
Yeah, UTF-16 interpreted as ASCII would be chock-full of nulls.

Reply

Paul Bennett <paul-bennett@...>