Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: TECH: Charset encodings (was Re: Conlang book - please include Vya:a:h)

From:David Starner <dstarner98@...>
Date:Monday, May 21, 2001, 9:01
At 11:20 PM 05/20/2001 -0500, you wrote:
>Funny you should mention mutt -- I was just about to post something about >it. I use mutt all the time, and it usually works fine, but *sometimes* it >shows the 8-bit characters as they should be in an original message, but as >? in messages quoting that message, or vice versa. I've set my LC_CTYPE to >en_US.ISO-8859-1 and LANG to C, and I've noticed the problem occurs on the >Linux console and in Konsole (KDE pre2.2). Anyone know how to make it show >them right all the time?
I don't. From a quick search through the manual, I don't think you can. (If you were to get pedantic, in theory you can't. I get as much unmarked messages in Asian character sets as I do in Latin-1. And if you happened to be a Romanian in Germany, you might get equal of unmarked, valuable mail in Latin-1 and Latin-2.)
>Is it a problem with people's mailers not marking >encoding correctly?
The examples I've seen, that has been the problem. Mutt can either do something like what you're seeing (The version I normally use, Debian's 1.3.17, prints them as \235), or try dumping them to the terminal. But any terminal has characters that must not be dumped to the terminal (many characters used by CP1252 must not be dumped to the Linux console), so then you try and guess which ones those are . . . all in all, it's easier and arguably more correct to do what Mutt does. (Especially as if/when you start using LC_CTYPE=en_US.UTF-8 and a Unicode console/terminal, mutt would either have to recode from an assumed Latin-1 to UTF-8 or just go back to turning the characters to question marks.) -- David Starner