Re: TECH: RFC 1345 (was Re: TECH: Testing again, no new on-topic content (was Re: "Language Creation" in your conlang))
From: | John Cowan <jcowan@...> |
Date: | Tuesday, November 18, 2003, 18:34 |
Paul Bennett scripsit:
> In that page, reference is made to RFC 1345, which defines one-or-more-byte
> sequences -- using bytes 33-126 only -- to cover a wide portion of Unicode
> (Latin, Geek,
As in Code of the Geeks? :-)
Time' haquarios et dona ferentes.
> Cyrillic, Hebrew, Arabic, Japanese, Chinese, plus some other
> symbols). I.e., it provides a lookup table between printable ASCII
> multibyte sequences and Unicode code points. Also, it shows a format for
> defining 8-bit character set mappings to those multibyte sets. Notably, it
> includes a mechanism for dealing with sequences containing 8-bit combining
> characters (letter plus diacritic sequences).
The more modern approach is to use a simple table mapping
a 2-digit hex number representing a byte to a 4-5 digit
hex number representing a Unicode codepoint. For lots of
examples, see http://www.unicode.org/Public/MAPPINGS/ and
http://crl.nmsu.edu/~mleisher/csets.html .
> My newest wish is that applications would openly and easily import and
> export character sets in RFC 1345 format, so that I could create my own 8-
> bit encoding (or we could create a pan-Conlang-L 8-bit encoding?) that
> could be imported into the email clients and browers of interested parties.
It's hard to get these things to interoperate, because there is such a
variety of mail clients in use. Better to push for clean UTF-8 paths
through listservs and UTF-8-aware clients. Most people are using closed
clients and have two choices: wait until the vendor fixes things (if
ever), or move to a more reasonable client such as Mozilla Thunderbird.
--
John Cowan jcowan@reutershealth.com www.ccil.org/~cowan www.reutershealth.com
I must confess that I have very little notion of what [s. 4 of the British
Trade Marks Act, 1938] is intended to convey, and particularly the sentence
of 253 words, as I make them, which constitutes sub-section 1. I doubt if
the entire statute book could be successfully searched for a sentence of
equal length which is of more fuliginous obscurity. --MacKinnon LJ, 1940
Reply