Re: TECH: Testing again, no new on-topic content (was Re: "Language Creation" in your conlang)

From:	John Cowan <cowan@...>
Date:	Monday, November 17, 2003, 1:16

|< < Post > >| << List/Tree >> Reference November 2003 Index

Paul Bennett scripsit:

> >[B]e aware that many people
> >who would otherwise receive Latin-1 characters fine won't even see
> >those if they're in a UTF-8 message.
>
> Er, are you sure? I thought that virtually every 128-255 character came
> through unscathed between Latin-1 and UTF-8. I bow to Jown as final
> arbiter, obviously, but that has always been my understanding.
You're confusing characters and their representations.  It's true that
the first 256 characters of Unicode are identical to the 256 characters
of Latin-1.  But the UTF-8 *representation* of the last 128 characters
of Latin-1 is quite different from the Latin-1 representation.  To
say no more, UTF-8 represents each of them with two bytes, whereas Latin-1
uses a single byte for each.

> What happens if I set my mail client to default encoding of "Latin-1" and
> paste some non-Latin-1 characters into the email? Is there an RFC that
> defines a suitable way of coping?
If your mail client understands Unicode at all, then it depends on the
conversion libraries that it uses: the usual convention is to map
unrepresentable characters into a question mark.  If the mail client
is Unicode-blind, it probably sends out the bytes of the UTF-8
representation, ignoring the Latin-1 encoding tag, which produces
gibberish.  If this second process is iterated, then the gibberish
doubles each time, as each byte is reinterpreted as Latin-1 and
then re-encoded as UTF-8 again.

--
They do not preach                              John Cowan
  that their God will rouse them                jcowan@reutershealth.com
    A little before the nuts work loose.        http://www.ccil.org/~cowan
They do not teach                               http://www.reutershealth.com
  that His Pity allows them                         --Rudyard Kipling,
    to drop their job when they damn-well choose.   "The Sons of Martha"

|< < Post > >| << List/Tree >> Reference November 2003 Index

Reply

Paul Bennett <paul-bennett@...>