Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unicode vs The Rest Of The World (Again) (was Re: Re: Le tilde a-t-il été utilisé en français?)

From:Paul Bennett <paul-bennett@...>
Date:Friday, April 30, 2004, 23:19
On Fri, 30 Apr 2004 14:35:00 -0700, Garth Wallace <gwalla@...>
wrote:

> Mark J. Reed wrote: >> Which would be bad enough if it were just the more typical "8-bit >> characters get munged; you must use 7-bit encoding methods" problem. >> But that's not the case. The mail server understands the various MIME >> 8-to-7-bit encoding techniques, reverses them, and *then* does >> the replacement anyway just as if the message arrived in 8-bit mode. > > Are you sure about that? I've gotte Unicode messages from the list > without problems before. I think it may be some people's clients doing > weird conversions.
It's only a small subset of Unicode that gets mangled, rather than every character (we've seen it on the Georgian alphabet, notably), at least with UTF-8. UTF-8 is not merely raw Unicode, but rather a set of multi-byte codes, only some of which lie within the deadly 128-150 range. Should anyone post in pure UTF-16, I imagine the problem might manifest itself more often, especially if they use the right (or wrong?) Unicode pages. Cases where every non-ASCII character gets shown as two gibberish Latin-1 characters are almost certainly problems with a mail client, but the UTF-8 mangling problem has been fairly rigorously deduced to be the fault of the Listserv software. Paul

Reply

Garth Wallace <gwalla@...>