Re: ASCIIifying
From: | taliesin the storyteller <taliesin@...> |
Date: | Wednesday, May 7, 2003, 8:54 |
* Mark J. Reed said on 2003-05-06 23:37:06 +0200
> On Tue, May 06, 2003 at 04:20:55PM -0400, Robert B Wilson wrote:
> > i guess i shouldn't trust microsoft at all (instead of trusting them
> > about as much as i trust bill clinton...)
>
> Technically, ASCII only defines 128 characters, numbered 0
> through 127, of which only 95 are "printable" - letters, numbers,
> puncutation, and such. [..] The rest are called "control
> characters", and mostly have metatextual functions originally geared
> toward issuing mechanical controls to teletypewriter terminals,
> separating records on magnetic tape, etc.
>
> Looking at your Windows Character Map accessory, printable ASCII
> starts with the space character (position 32), followed by !, ",
> #, $, etc., all the way up to ~ (position 126). The last position,
> 127, is another control character, DELETE.
>
> "Latin-1" is a nickname for the ISO-8859-1 character set
> (International Standards Organization publication number 8859,
> part 1). Each of the character sets within ISO-8859 defines 256
> characters, exactly twice as many as ASCII, and have in common that
> the first 128 are the same as ASCII. They have another feature in
> common, which is that the first 32 characters of the second half -
> that is, positions 128 through 159 - are designated as more control
> characters.
The *reason* why there aren't any characters from 128 to 159 is that
if the text is passed through something that only handles 7-bit ASCII,
like some old email-servers, news-servers and other internet pillars,
these charcters are stripped (converted, mapped) down to the control-
characters from 0 to 31. In some programs, these have an important
meaning, like: end of file, stop program, make noise. Feed such
converted text to these programs and happy, happy, fun things happen!
The control-characters in positions 10 and 13 are used for marking
end of line (10 for unix, 13 for mac, 13 follwed by 10 for dos/windows)
so having a few extra of these only inserts extra lines. The others
might very well corrupt the data completely. Fun, yes?
> The character set used by Windows 9x is neither ASCII nor Latin-1.
> It's a nonstandard variant of Latin-1 called Windows-1252. The
> ASCII half is the same, as are all of the characters from 160 up.
> But in between, instead of more control characters, it puts extra
> printable characters such as the O-E ligature, which do not appear
> in Latin-1.
Which is why we all so looove win-1252, and yet another reason to
looove Microsoft, because they make life so interesting!
t.
Reply