Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: ASCIIifying

From:taliesin the storyteller <taliesin@...>
Date:Wednesday, May 7, 2003, 8:54
* Mark J. Reed said on 2003-05-06 23:37:06 +0200
> On Tue, May 06, 2003 at 04:20:55PM -0400, Robert B Wilson wrote: > > i guess i shouldn't trust microsoft at all (instead of trusting them > > about as much as i trust bill clinton...) > > Technically, ASCII only defines 128 characters, numbered 0 > through 127, of which only 95 are "printable" - letters, numbers, > puncutation, and such. [..] The rest are called "control > characters", and mostly have metatextual functions originally geared > toward issuing mechanical controls to teletypewriter terminals, > separating records on magnetic tape, etc. > > Looking at your Windows Character Map accessory, printable ASCII > starts with the space character (position 32), followed by !, ", > #, $, etc., all the way up to ~ (position 126). The last position, > 127, is another control character, DELETE. > > "Latin-1" is a nickname for the ISO-8859-1 character set > (International Standards Organization publication number 8859, > part 1). Each of the character sets within ISO-8859 defines 256 > characters, exactly twice as many as ASCII, and have in common that > the first 128 are the same as ASCII. They have another feature in > common, which is that the first 32 characters of the second half - > that is, positions 128 through 159 - are designated as more control > characters.
The *reason* why there aren't any characters from 128 to 159 is that if the text is passed through something that only handles 7-bit ASCII, like some old email-servers, news-servers and other internet pillars, these charcters are stripped (converted, mapped) down to the control- characters from 0 to 31. In some programs, these have an important meaning, like: end of file, stop program, make noise. Feed such converted text to these programs and happy, happy, fun things happen! The control-characters in positions 10 and 13 are used for marking end of line (10 for unix, 13 for mac, 13 follwed by 10 for dos/windows) so having a few extra of these only inserts extra lines. The others might very well corrupt the data completely. Fun, yes?
> The character set used by Windows 9x is neither ASCII nor Latin-1. > It's a nonstandard variant of Latin-1 called Windows-1252. The > ASCII half is the same, as are all of the characters from 160 up. > But in between, instead of more control characters, it puts extra > printable characters such as the O-E ligature, which do not appear > in Latin-1.
Which is why we all so looove win-1252, and yet another reason to looove Microsoft, because they make life so interesting! t.

Reply

Tristan McLeay <kesuari@...>