Re: Tech: Unicode (was...)

From:	Mark J. Reed <markjreed@...>
Date:	Friday, May 7, 2004, 13:36

|< < Post > >| << List/Tree >> Reference May 2004 Index

On Fri, May 07, 2004 at 02:25:20AM -0700, Garth Wallace wrote:
> The problem is that your macros are redundant. There's already a way of
> getting 8-bit characters over a 7-bit communication channel: it's called
> MIME Content-Encoding: quoted-printable. Characters from 0-127 besides
> "=" are preserved intact, characters from 128-255 are an "=" followed by
> the hex code, and IIRC the equals sign is "==". MIME-aware mailreaders
> will decode it automatically.
The equal sign is not special-cased; it's passed as its hex code, =3D.
But, in case you've missed this discussion, somehow:

        THE MAILING LIST STILL WON'T PASS UNICODE THROUGH INTACT.

The listserv software used on listserv.brown.edu, for whatever reason,
strips the high bit off bytes in the decimal range 128-160, EVEN IF THEY
ARE ENCODED AS QUOTED-PRINTABLE.  Or base64.  You send a message with,
say, Cyrillic yeru, U+044B.  It is UTF-8 encoded and then QP-encoded,
the result being =D1=8B.   The listserv software turns it into =D1=0B,
which is an illegal UTF-8 sequence, so the list recipients get
gobbledygook.

The only way to pass Unicode through the list serv is to use an encoding
mechanism that doesn't somehow involve bytes in that range encoded in a
standard way.  For instance, UTF-7 works, but not many mailers
understand it.  You could probably use some fancy footwork with SCSU to avoid
that range of bytes, but you'd have to write a custom SCSU encoder, and
still, not many mailers understand SCSU.

-Mark

|< < Post > >| << List/Tree >> Reference May 2004 Index

Reply

Henrik Theiling <theiling@...>