Re: Taxonomic Vocabulary
From: | And Rosta <and.rosta@...> |
Date: | Monday, September 4, 2006, 13:03 |
Tasci, On 31/08/2006 22:20:
> Suppose you have a language composed of a discrete, finite set of
> syllables. I was considering the ideal way to construct vocabulary
> for that language. My idea was to divide all concepts into separate
> categories, one for each syllable. Then subcategories would be
> equally subdivided, and subsubcategories and so forth. To identify
> any word in this language, it would only be a search on an O(k *
> log(k)(n)) where log(k) is log base k. That is, you have to know
> what each letter means, then you automatically narrow down the word
> lookup exponentially. It would be like as if every letter beginning
> with 'a' were all related somehow, in a way that all other words are
> not.
>
> It sounds like a great strategy, but I've been having problems with
> the fact that many concepts we think up are very specific. Horse for
> instance. It's a four legged ungulate equiid, an animal mammal that
> eats hay, carries people, has a large bottom, its coat is referred to
> as hide not fur, it has a mane referred to as hair, as in 'horsehair'
> etc etc etc. Just to call a horse a living organism that's a animal
> chordate mammal ungulate equiid Equus equs alone would take 7
> syllables. How would I differentiate the horse from the zebra, from
> the weasel, from the sea squirt, if I tried to limit it to 4
> syllables of specification? That is, a 4-syllable word for living
> organism animal chordate, which is already pretty darn long compared
> to the 1 syllable 'horse'.
>
> What I end up with is an extremely deep and sparse distribution, very
> frustrating because a lot of concepts like other non-horse members of
> genus Equus, do not even exist! Certainly they're not found in common
> conversation. Should I just randomly determine vocabulary? It'd be
> an even spread, but it would be a lot harder to remember if xrbtsx is
> horse and xrblsx is desk lamp for instance.
You describe the standard problems with taxonomic vocabularies, but I am of the
minority opinion that these problems can be well circumvented if approached the
right way. Suppose you have 60 syllables. Other things being equal, this gives
you a taxonomic tree where each node supports 60 branches. Then you need to
slot your concepts into this taxonomy, following the principle that a concept
can be assigned to a form of n syllables only when all forms of n-1 syllables
have been assigned concepts. That solves the deep-and-sparse problem. But it
does mean that you can't work from standard language-independent
quasi-scientific taxonomies.
If I were implementing such a scheme, I'd divide the syllables into two classes,
one that are always word-nonfinal and have taxonomic import, and one that are
always word-final and don't have taxonomic import. That would give you
self-segmentation and allow words of any number of syllables. Reserve one
syllable from the set of finals to mark genericity. E.g. if 'baso' is 'green'
and 'bate' is 'red' and 'ka' is the genericity marker, then 'baka' would be
'colour'.
--And.
Reply