Re: Tech: Unicode (was...)
From: | Mark J. Reed <markjreed@...> |
Date: | Monday, May 3, 2004, 12:29 |
On Mon, May 03, 2004 at 02:37:10AM -0700, Philippe Caquant wrote:
> By the way, I was wondering about the following:
>
> When somebody feels like sending some Unicode on this
> list, wouldn'it be possible he justs sends, either the
> hexadecimal, either the decimal codes, separated by a
> blank or a comma, and then somebody would write a nice
> little macro allowing, once you pasted these codes
> into a Word document, to translate them automatically
> into Unicode in this Word document?
Of course you could do that, but it'd be a pretty silly thing to do.
If you're going to have to go through a copy-paste-decode step, you
might as well write a Word macro that understands UTF-7, since it's much
more efficient, many mail user agents can understand it without
requiring the copy-paste-decode, and some mail user agents can even
generate it automatically. You'd want to write an encoding version of
the Word macro in any case to avoid the tedium on the composition side,
so the actual intermediate form shouldn't matter much. Though why on Earth
you would ever use the *decimal* code points is beyond me.
If I want to type Cyrillic, I hit control-C and type the Roman
transliteration: "Kuda idyot Ivan Ivanovich" => «Куда идёт Иван
Иванович». Having to go through Character Map or look-up/type in the
hex codes for every letter would drive me banana nuts.
> just now), you could send:
> 1040,1041,1042,1043
I can't even begin to turn those into characters without converting into
hex first (see above about decimal code points). But yes, those are
U+-0410, U+-0411, U+-0412, and U+-0413, the uppercase versions of the first
four letters of the Cyrillic alphabet:
АБВГ
I'm sending this message in UTF-7; if your mailer understand UTF-7, then
you'll see all the Cyrillic stuff automatically. What it actually looks
like underneath, and what those whose mailers don't understand UTF-7
see, is ugly but relatively compact. If your mailer groks UTF-7,
the following is what the above four-letter sequence looks to those
whose mailer doesn't. If your mailer doesn't understand UTF-7, it
should look almost the same as the above except for an extra minus sign,
which makes the difference between a literal plus sign and the start of
an encoded sequence.
+-BBAEEQQSBBM
-Mark
Replies