Re: Unicode 3.0
From: | taliesin the storyteller <taliesin@...> |
Date: | Friday, October 1, 1999, 19:01 |
* Don Blaheta (dpb@cs.brown.edu) [991001 20:12]:
/snippage, only replying to what Don has to say/
>
> To be exact, ASCII is a 7-bit set; it has 128 possible values, of which
> several (33) are taken up with "control" values, like "null",
> "backspace", "end of line", and so forth. The printable ASCII
> characters are exactly those which appear on a standard US keyboard.
> There were in the 80s a number of "national" sets which replaced
> characters such as {} with their own forms like n-tilde, a-umlaut, and
> so on. The ISO approved a series of 8-bit character sets (iso-8859) in
> the late 80s (?), each of which had 256 potential characters. But the
> first 128 of each set were identical to ASCII, and 32 of the remaining
> 128 were taken for more control characters (which have never really been
> used...).
Uhm, those first 32 aren't to be used because if the 8-bit sign is
converted to 7-bit (by chopping of the eight bit, like quite a few
gateways do...), you'd end up with the control-signs... Imagine an
end-of-file marker in the middle of a text... incidentally, Microsoft
has used these 32 "dangerous" positions for things like smartquotes
etc., yet another reason that company is a pox on humanity. :)
/snip/
>
> Enter Unicode. Rather than restrict itself to 8 bits, the Unicode
> consortium decided to make a 16-bit standard. This gave them 65,535
> character values to play with; finally, they could create one character
> set to include every character in every script currently in use, and
> several that aren't.
Anyone know if the iso-8859-x sets are still copied as-is in unicode?
tal.
--
"Better living through conlanging"