Conlang: Re: Types of numerals; bases in natlangs. (Herman Miller, Jan 17 '06, 5:03)

Re: Types of numerals; bases in natlangs.

From:	Herman Miller <hmiller@...>
Date:	Tuesday, January 17, 2006, 5:03

From:

Herman Miller <hmiller@...>

Date:

Tuesday, January 17, 2006, 5:03

Tristan McLeay wrote:

> Yes, such as this one. "Byte" mightn't be an SI unit, but "kilo-" and > "mega-" and so forth are common prefixes, and everyone knows they mean > 1000 and 1000*1000. The matter is confused not by "*!$(#" hard drive > manufacturers (and I think that language is inappropriate here) who > simply use the prefixes in their commonly accepted ways, but by computer > scientists & programmers who take the prefixes and used them to mean > something they'd never meant before.

I understand the problem, but language doesn't always make sense. If you're making up your own language, you have the luxury of getting it right in the first place, but everyone else just has to learn the language the way it's used, with all of its idiosyncrasies. Saying that a kilobyte really always means 1,000 bytes and nothing else sounds like a foreigner criticising the illogical aspects of your native language. They may be right in some sense, but their own language most likely has things that are just as bad. And I shouldn't have got upset at the hard disk manufacturers, but I guess I felt like a foreigner was criticizing my native language and I got emotional about it.

> Powers of two are important in a binary system---1 048 576 is a much > more useful and natural number in computing than is 1 000 000. If we > want to create a word that refers to 1 048 576 bytes, the reasonable > thing to do is to create a new word to avoid any possible confusion. > It doesn't hurt anyone to speak of kibibytes or mebibytes, and is at > least unambiguous.

It would have been nice if the original coiners of the words "kilobyte" and "megabyte" had used something different, but they didn't, and this terminology has been standard for as long as I can remember. There's a lot of inertia behind those words ("computer-geek group-think" as Paul Bennett put it). The point that I thought was interesting (and relevant to the discussion) is that "computer-speak" (as Thomas put it) uses two different bases for representing numerals. In the past, it wouldn't have been uncommon to hear references to "three thousand hex" meaning 0x3000 (12,288 in decimal). To some extent, programmers naturally absorb some aspects of hex (or octal in the old days) and use it as an independent base, although not as fluently as decimal. But it's trivial to look at a number like 0x3000 and mentally convert it to "12K" (where "K" doesn't even mentally stand for "kilobytes", it's just the way it's said). In the back of my mind I know that "K" (always capitalized) stands for "kilobytes", but I don't think "kilobytes" any more than I think "big thousand" for "million". That's just the etymology of the word.

> (BTW: When is "very recently"? It's been over six years since the IEC > created the binary prefixes, and I doubt that the confusion started on > that day; at any rate, six years is a long time, especially in > computers.

I don't know; six years seems like "very recently" to me, when you consider how long "kilobyte" has meant 1,024 bytes.

> Wikipedia notes (albeit without sourcing) that using kilobyte > and megabyte to mean 1000 bytes and 1 000 000 bytes "has a long > engineering tradition, predating consumer complaints about the apparent > discrepancy, which began to surface in the mid-1990s". An ambiguous > sentence which could mean either the complaints started in the > mid-1990s, or the tradition did. The next sentences clarify as the > former: "The decimal-based capacity in hard disk drives follows the > method usef for serially accessed storage media which predate direct > access storage media like hard disk drives. Paper punch cards could only > be used in a serial fashion, like the magnetic tapes that followed. When > a stream of data is stored, it's more logical to indicate how many > thousands, millions or billions of bytes have been stored versus how > many multiples of 1024, 1 048 576 or 1 073 741 824 bytes have been. When > the first hard disk drives were being developed, the decimal measurement > was only natural since the hard disk drive served essentially the same > function as punc cards and tapes". It would seem therefore that it's > certainly *not* a recent phenomenon, and I do not understand how > correctness can be defined only by *one* usage, when their are *two* > uses, both quite old, and one with etymology on its side.)

What I'm objecting to is the statement that "unlike what you have heard, a kilobyte and megabyte are really exactly 1000 and 1000000 units." As I've said, when you're dealing with RAM, a kilobyte is always 1,024 bytes and a megabyte is always 1,048,576 bytes. I wasn't aware of the paper punch card tradition, since that was before my time, but it's wrong to say that a kilobyte is "really exactly" 1,000 bytes when that's only true in some contexts and not others. It would be more accurate to say that it "should really" be 1,000 bytes, but in common usage it isn't when you're dealing with blocks of memory. If the standardizers had come up with something less silly than "kibibyte" and "mebibyte" it might have caught on (why not "kilibyte" and "megibyte"?), but as it is, it hasn't, and artificial attempts to change the language aren't always successful (think of all the attempts to add gender-neutral third person pronouns to English).

Replies