Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unicode 3.0

From:taliesin the storyteller <taliesin@...>
Date:Friday, October 1, 1999, 19:01
* Don Blaheta (dpb@cs.brown.edu) [991001 20:12]:
/snippage, only replying to what Don has to say/
> > To be exact, ASCII is a 7-bit set; it has 128 possible values, of which > several (33) are taken up with "control" values, like "null", > "backspace", "end of line", and so forth. The printable ASCII > characters are exactly those which appear on a standard US keyboard. > There were in the 80s a number of "national" sets which replaced > characters such as {} with their own forms like n-tilde, a-umlaut, and > so on. The ISO approved a series of 8-bit character sets (iso-8859) in > the late 80s (?), each of which had 256 potential characters. But the > first 128 of each set were identical to ASCII, and 32 of the remaining > 128 were taken for more control characters (which have never really been > used...).
Uhm, those first 32 aren't to be used because if the 8-bit sign is converted to 7-bit (by chopping of the eight bit, like quite a few gateways do...), you'd end up with the control-signs... Imagine an end-of-file marker in the middle of a text... incidentally, Microsoft has used these 32 "dangerous" positions for things like smartquotes etc., yet another reason that company is a pox on humanity. :) /snip/
> > Enter Unicode. Rather than restrict itself to 8 bits, the Unicode > consortium decided to make a 16-bit standard. This gave them 65,535 > character values to play with; finally, they could create one character > set to include every character in every script currently in use, and > several that aren't.
Anyone know if the iso-8859-x sets are still copied as-is in unicode? tal. -- "Better living through conlanging"