Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unicode 3.0

From:John Cowan <cowan@...>
Date:Friday, October 1, 1999, 19:26
Don Blaheta scripsit:

> Enter Unicode. Rather than restrict itself to 8 bits, the Unicode > consortium decided to make a 16-bit standard. This gave them 65,535 > character values to play with; finally, they could create one character > set to include every character in every script currently in use, and > several that aren't. > > Of course, this isn't without its problems. One-byte codes are *very* > entrenched in the computer world, and there is a lot of extant code that > assumes that characters are only one byte long.
Enter UTF-8. This is a method for encoding Unicode, whereby the 128 ASCII values continue to be represented by 0-127 only, and combinations of 2, 3, or 4 numbers in the range 128-253 (254 and 255 aren't used) are used to represent all the other Unicode characters. Thi simeans that programs that understand only ASCII still work, and the other characters can often just be "passed through" without understanding. Not perfect, but it helps. -- John Cowan cowan@ccil.org I am a member of a civilization. --David Brin