Re: The mysterious substitution of question marks
From: | Mark J. Reed <markjreed@...> |
Date: | Wednesday, October 22, 2003, 14:06 |
On Wed, Oct 22, 2003 at 02:15:26AM -0400, Estel Telcontar wrote:
> Regarding accented characters etc., I've often had minor problems with
> them, but since I started getting messages in digest form, it's been
> MUCH worse (though from some people they still seem to come through
> fine). And in messages from Christophe, I now find a whole lot of
> equals signs followed by numbers which make it very hard to read, as in
> the following:
>
> >En r=E9ponse =E0 Paul Bennett :
Hm. Could be that the digest is marked with the Content-Transfer-Encoding
of, say, the first message it includes, or something, rather than a proper
superset of the encodings of all the concatenated messages.
What you're seeing is the "quoted-printable" content transfer encoding
used to transmit 8-bit text (Latin-1, in this case) safely through
7-bit-ASCII-only channels. The number after the equals sign is
the hexadecimal code point of the character that was originally there;
the receiving mail program is supposed to handle putting them back,
but only if told to do so by the headers.
It's tedious to decode manually, but if context isn't enough to
tell you what character is intended, you can find a visual table giving
the mappings here:
http://www.unicode.org/charts/PDF/U0080.pdf
For instance, to find =E9, go down the 00E column until you get to the 9
row; that tells you that =E9 is a lowercase e with an acute accent mark.
=E0 is a lowercase a with a grave accent mark, =C8 is a capital E with
a grave, etc.
-Mark