Re: Unicode vs The Rest Of The World (Again) (was Re: Re: Le tilde a-t-il été utilisé en français?)
From: | Garth Wallace <gwalla@...> |
Date: | Saturday, May 1, 2004, 0:38 |
Paul Bennett wrote:
> On Fri, 30 Apr 2004 14:35:00 -0700, Garth Wallace <gwalla@...>
> wrote:
>
>> Mark J. Reed wrote:
>>
>>> Which would be bad enough if it were just the more typical "8-bit
>>> characters get munged; you must use 7-bit encoding methods" problem.
>>> But that's not the case. The mail server understands the various MIME
>>> 8-to-7-bit encoding techniques, reverses them, and *then* does
>>> the replacement anyway just as if the message arrived in 8-bit mode.
>>
>>
>> Are you sure about that? I've gotte Unicode messages from the list
>> without problems before. I think it may be some people's clients doing
>> weird conversions.
>
>
> It's only a small subset of Unicode that gets mangled, rather than every
> character (we've seen it on the Georgian alphabet, notably), at least with
> UTF-8. UTF-8 is not merely raw Unicode, but rather a set of multi-byte
> codes, only some of which lie within the deadly 128-150 range.
Ah, so it's only the Unicode characters that contain bytes matching
ASCII control characters with the 8th bit set that get mangled. Okay.
> Should anyone post in pure UTF-16, I imagine the problem might manifest
> itself more often, especially if they use the right (or wrong?) Unicode
> pages.
Yeah, UTF-16 interpreted as ASCII would be chock-full of nulls.
Reply