Re: TECH: Testing again, no new on-topic content (was Re: "Language Creation" in your conlang)
From: | John Cowan <cowan@...> |
Date: | Monday, November 17, 2003, 1:16 |
Paul Bennett scripsit:
> >[B]e aware that many people
> >who would otherwise receive Latin-1 characters fine won't even see
> >those if they're in a UTF-8 message.
>
> Er, are you sure? I thought that virtually every 128-255 character came
> through unscathed between Latin-1 and UTF-8. I bow to Jown as final
> arbiter, obviously, but that has always been my understanding.
You're confusing characters and their representations. It's true that
the first 256 characters of Unicode are identical to the 256 characters
of Latin-1. But the UTF-8 *representation* of the last 128 characters
of Latin-1 is quite different from the Latin-1 representation. To
say no more, UTF-8 represents each of them with two bytes, whereas Latin-1
uses a single byte for each.
> What happens if I set my mail client to default encoding of "Latin-1" and
> paste some non-Latin-1 characters into the email? Is there an RFC that
> defines a suitable way of coping?
If your mail client understands Unicode at all, then it depends on the
conversion libraries that it uses: the usual convention is to map
unrepresentable characters into a question mark. If the mail client
is Unicode-blind, it probably sends out the bytes of the UTF-8
representation, ignoring the Latin-1 encoding tag, which produces
gibberish. If this second process is iterated, then the gibberish
doubles each time, as each byte is reinterpreted as Latin-1 and
then re-encoded as UTF-8 again.
--
They do not preach John Cowan
that their God will rouse them jcowan@reutershealth.com
A little before the nuts work loose. http://www.ccil.org/~cowan
They do not teach http://www.reutershealth.com
that His Pity allows them --Rudyard Kipling,
to drop their job when they damn-well choose. "The Sons of Martha"
Reply