|Date:||Thursday, August 31, 2006, 21:30|
Suppose you have a language composed of a discrete, finite set of syllables. I was
considering the ideal way to construct vocabulary for that language. My idea
was to divide all concepts into separate categories, one for each syllable.
Then subcategories would be equally subdivided, and subsubcategories and so
forth. To identify any word in this language, it would only be a search on an
O(k * log(k)(n)) where log(k) is log base k. That is, you have to know what
each letter means, then you automatically narrow down the word lookup
exponentially. It would be like as if every letter beginning with 'a' were all
related somehow, in a way that all other words are not.
It sounds like a great strategy, but I've been having problems with the fact that
many concepts we think up are very specific. Horse for instance. It's a four
legged ungulate equiid, an animal mammal that eats hay, carries people, has a
large bottom, its coat is referred to as hide not fur, it has a mane referred
to as hair, as in 'horsehair' etc etc etc. Just to call a horse a living
organism that's a animal chordate mammal ungulate equiid Equus equs alone would
take 7 syllables. How would I differentiate the horse from the zebra, from the
weasel, from the sea squirt, if I tried to limit it to 4 syllables of
specification? That is, a 4-syllable word for living organism animal chordate,
which is already pretty darn long compared to the 1 syllable 'horse'.
What I end up with is an extremely deep and sparse distribution, very frustrating
because a lot of concepts like other non-horse members of genus Equus, do not
even exist! Certainly they're not found in common conversation. Should I just
randomly determine vocabulary? It'd be an even spread, but it would be a lot
harder to remember if xrbtsx is horse and xrblsx is desk lamp for instance.
I had one more idea: that instead of starting with general categories, I start
with specific terms, then generalize. So I could have 'to' mean horse, and
'tobu' be anything in Equus equs, and 'tobuba' be anything in the Equiid
family, and so forth. Trouble with that is, which specific concepts get to be
the root of all language? Wouldn't they have to be generalized, by necessity?
Pandora "Starling/Tasci/Antinomy/Figment/???" synx