Re: TECH: Charset encodings (was Re: Conlang book - please include Vya:a:h)

From:	David Starner <dstarner98@...>
Date:	Monday, May 21, 2001, 9:01

|< < Post > >| << List/Tree >> Reference May 2001 Index

At 11:20 PM 05/20/2001 -0500, you wrote:
>Funny you should mention mutt -- I was just about to post something about
>it. I use mutt all the time, and it usually works fine, but *sometimes* it
>shows the 8-bit characters as they should be in an original message, but as
>? in messages quoting that message, or vice versa. I've set my LC_CTYPE to
>en_US.ISO-8859-1 and LANG to C, and I've noticed the problem occurs on the
>Linux console and in Konsole (KDE pre2.2). Anyone know how to make it show
>them right all the time?
I don't. From a quick search through the manual, I don't think you can.
(If you were to get pedantic, in theory you can't. I get as much unmarked
messages in Asian character sets as I do in Latin-1. And if you happened
to be a Romanian in Germany, you might get equal of unmarked, valuable
mail in Latin-1 and Latin-2.)

>Is it a problem with people's mailers not marking
>encoding correctly?
The examples I've seen, that has been the problem. Mutt can either do something
like what you're seeing (The version I normally use, Debian's 1.3.17,
prints them
as \235), or try dumping them to the terminal. But any terminal has characters
that must not be dumped to the terminal (many characters used by CP1252
must not be dumped to the Linux console), so then you try and guess which ones
those are . . . all in all, it's easier and arguably more correct to do
what Mutt does.
(Especially as if/when you start using LC_CTYPE=en_US.UTF-8 and a Unicode
console/terminal, mutt would either have to recode from an assumed Latin-1 to
UTF-8 or just go back to turning the characters to question marks.)

--
David Starner

|< < Post > >| << List/Tree >> Reference May 2001 Index