Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Tech: Unicode (was...)

From:Mark J. Reed <markjreed@...>
Date:Friday, May 7, 2004, 13:36
On Fri, May 07, 2004 at 02:25:20AM -0700, Garth Wallace wrote:
> The problem is that your macros are redundant. There's already a way of > getting 8-bit characters over a 7-bit communication channel: it's called > MIME Content-Encoding: quoted-printable. Characters from 0-127 besides > "=" are preserved intact, characters from 128-255 are an "=" followed by > the hex code, and IIRC the equals sign is "==". MIME-aware mailreaders > will decode it automatically.
The equal sign is not special-cased; it's passed as its hex code, =3D. But, in case you've missed this discussion, somehow: THE MAILING LIST STILL WON'T PASS UNICODE THROUGH INTACT. The listserv software used on listserv.brown.edu, for whatever reason, strips the high bit off bytes in the decimal range 128-160, EVEN IF THEY ARE ENCODED AS QUOTED-PRINTABLE. Or base64. You send a message with, say, Cyrillic yeru, U+044B. It is UTF-8 encoded and then QP-encoded, the result being =D1=8B. The listserv software turns it into =D1=0B, which is an illegal UTF-8 sequence, so the list recipients get gobbledygook. The only way to pass Unicode through the list serv is to use an encoding mechanism that doesn't somehow involve bytes in that range encoded in a standard way. For instance, UTF-7 works, but not many mailers understand it. You could probably use some fancy footwork with SCSU to avoid that range of bytes, but you'd have to write a custom SCSU encoder, and still, not many mailers understand SCSU. -Mark

Reply

Henrik Theiling <theiling@...>