Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: TECH: Testing again, no new on-topic content (was Re: "Language Creation" in your conlang)

From:John Cowan <cowan@...>
Date:Monday, November 17, 2003, 1:16
Paul Bennett scripsit:

> >[B]e aware that many people > >who would otherwise receive Latin-1 characters fine won't even see > >those if they're in a UTF-8 message. > > Er, are you sure? I thought that virtually every 128-255 character came > through unscathed between Latin-1 and UTF-8. I bow to Jown as final > arbiter, obviously, but that has always been my understanding.
You're confusing characters and their representations. It's true that the first 256 characters of Unicode are identical to the 256 characters of Latin-1. But the UTF-8 *representation* of the last 128 characters of Latin-1 is quite different from the Latin-1 representation. To say no more, UTF-8 represents each of them with two bytes, whereas Latin-1 uses a single byte for each.
> What happens if I set my mail client to default encoding of "Latin-1" and > paste some non-Latin-1 characters into the email? Is there an RFC that > defines a suitable way of coping?
If your mail client understands Unicode at all, then it depends on the conversion libraries that it uses: the usual convention is to map unrepresentable characters into a question mark. If the mail client is Unicode-blind, it probably sends out the bytes of the UTF-8 representation, ignoring the Latin-1 encoding tag, which produces gibberish. If this second process is iterated, then the gibberish doubles each time, as each byte is reinterpreted as Latin-1 and then re-encoded as UTF-8 again. -- They do not preach John Cowan that their God will rouse them jcowan@reutershealth.com A little before the nuts work loose. http://www.ccil.org/~cowan They do not teach http://www.reutershealth.com that His Pity allows them --Rudyard Kipling, to drop their job when they damn-well choose. "The Sons of Martha"

Reply

Paul Bennett <paul-bennett@...>