Re: The mysterious substitution of question marks (was Re: Elves and Ill Bethisad)
From: | Mark J. Reed <markjreed@...> |
Date: | Tuesday, October 21, 2003, 13:53 |
On Mon, Oct 20, 2003 at 11:20:15PM -0400, Tristan McLeay wrote:
> Are headers allowed to be in non-ASCII?
No.
> If they are, how can you tell what charset they're in before you've read them?
Exactly the reason they're not allowed to be non-ASCII. (And you can't
go by the Content-Type header because the spec allows headers to appear
in any order, so you may not have seen the Content-Type header by the
time you're processing one of the others). However, there
is a special form that lets you encode non-ASCII text in ASCII for use
in headers. It looks like this:
=?charset?encoding?text?=
Where "encoding" is an abbreviated form of the name of one of the two common
Content-Transfer-Encoding values: Q for quoted-printable, B for base64.
And "text" is not allowed to have spaces, so you get the whole =?...?=
bit for every word.
For instance, when I sent out a message with the subject of
"¿Puedes oír los tambores, Fernando?", what actually got transmitted was this:
Subject: =?iso-8859-1?Q?=BFPuedes?= =?iso-8859-1?Q?o=EDr?= los tambores,
Fernando?
But if all the software on both ends is functioning properly, humans never
see that. :)
-Mark