Re: The mysterious substitution of question marks (was Re: Elves and Ill Bethisad)

From:	Mark J. Reed <markjreed@...>
Date:	Tuesday, October 21, 2003, 13:53

|< < Post > >| << List/Tree >> Reference October 2003 Index

On Mon, Oct 20, 2003 at 11:20:15PM -0400, Tristan McLeay wrote:
> Are headers allowed to be in non-ASCII?
No.

> If they are, how can you tell what charset they're in before you've read them?
Exactly the reason they're not allowed to be non-ASCII.  (And you can't
go by the Content-Type header because the spec allows headers to appear
in any order, so you may not have seen the Content-Type header by the
time you're processing one of the others).  However, there
is a special form that lets you encode non-ASCII text in ASCII for use
in headers.  It looks like this:

=?charset?encoding?text?=

Where "encoding" is an abbreviated form of the name of one of the two common
Content-Transfer-Encoding values: Q for quoted-printable, B for base64.
And "text" is not allowed to have spaces, so you get the whole =?...?=
bit for every word.

For instance, when I sent out a message with the subject of
"¿Puedes oír los tambores, Fernando?", what actually got transmitted was this:

Subject: =?iso-8859-1?Q?=BFPuedes?= =?iso-8859-1?Q?o=EDr?= los tambores,
        Fernando?

But if all the software on both ends is functioning properly, humans never
see that. :)

-Mark

|< < Post > >| << List/Tree >> Reference October 2003 Index