Re: Unicode vs The Rest Of The World (Again) (was Re: Re: Le tilde a-t-il été utilisé en français?)
From: | Paul Bennett <paul-bennett@...> |
Date: | Friday, April 30, 2004, 23:19 |
On Fri, 30 Apr 2004 14:35:00 -0700, Garth Wallace <gwalla@...>
wrote:
> Mark J. Reed wrote:
>> Which would be bad enough if it were just the more typical "8-bit
>> characters get munged; you must use 7-bit encoding methods" problem.
>> But that's not the case. The mail server understands the various MIME
>> 8-to-7-bit encoding techniques, reverses them, and *then* does
>> the replacement anyway just as if the message arrived in 8-bit mode.
>
> Are you sure about that? I've gotte Unicode messages from the list
> without problems before. I think it may be some people's clients doing
> weird conversions.
It's only a small subset of Unicode that gets mangled, rather than every
character (we've seen it on the Georgian alphabet, notably), at least with
UTF-8. UTF-8 is not merely raw Unicode, but rather a set of multi-byte
codes, only some of which lie within the deadly 128-150 range.
Should anyone post in pure UTF-16, I imagine the problem might manifest
itself more often, especially if they use the right (or wrong?) Unicode
pages.
Cases where every non-ASCII character gets shown as two gibberish Latin-1
characters are almost certainly problems with a mail client, but the UTF-8
mangling problem has been fairly rigorously deduced to be the fault of the
Listserv software.
Paul
Reply