Conlang: Re: Taxonomic Vocabulary (H. S. Teoh, Sep 1 '06, 0:58)

Re: Taxonomic Vocabulary

From:	H. S. Teoh <hsteoh@...>
Date:	Friday, September 1, 2006, 0:58

From:

H. S. Teoh <hsteoh@...>

Date:

Friday, September 1, 2006, 0:58

On Thu, Aug 31, 2006 at 03:48:13PM -0700, Tasci wrote:

> Thank you for your prompt reply! You are very wise, so anything I omit > you can assume I agree with. > > On Thu, 31 Aug 2006 15:23:56 -0700 > "H. S. Teoh" <hsteoh@...> wrote: > > > Now, a computer would have absolutely no trouble picking out the > > right words, but the human brain doesn't work that way. > > Depending on how you program the computer of course. If it strictly > matches words yes, but in computer terms what you're describing is > called a hash collision, as far as I know.

Well, that assumes the brain uses a hash function for word recognition. :-) AFAIK, our brain is more like a pattern-matcher whereas a computer deals with precise things---that's the difference I was driving at. Now of course, it's possible to write a program that operates based on pattern matching instead of exact comparison, but that's besides the point I was trying to make.

> > You're assuming that roots must necessarily be complete general. > > How about my idea to make the roots complete specific, and > generalizing from there? That's got problems of its own, but it might > be something to consider during the designing of the real vocabulary.

I have to admit I've not experimented that much in that area before, so you'll have to explore it and find out.

> > 1) Frequency of usage is more important than beauty of internal > > structure. You should cater to the fact that the most frequently used > > words should be most economical, even if the concepts themselves are > > very complicated and require a lot of specification in a taxonomic > > system. > > Perhaps it would be interesting to try a vocabulary that starts with > the most frequent words, and then has specifiers for related words, > much in the way our brain works by taking familiar concepts and > following connections to related concepts.

That sounds workable. [...]

> > What exactly constitutes 'frequent' depends on what your target > > audience is. The language of a farmer is very different from the > > language of an academic researcher, even though they share a > > similar subset that they can mutually understand each other > > (e.g., they both speak English---even though the kind of English > > the researcher speaks uses a lot of words that the farmer doesn't > > use from day to day, and vice versa). > > Now see, that's just the very reason that a taxonomic vocabulary would > make total sense. The scientist words would have a certain prefix (or > suffix) and beneath that would use the same combinations as the > farmer, who would have a different 'farmer words' prefix. Maybe it > wouldn't be subdivided by scientist and farmer, but it would allow us > to remember less, once we know the context of our vocabulary. Maybe > it would even work to establish the context of the vocabulary, then > omit the assumed prefix until you're done with that context. Not sure > how to conceptualize it, but something like... > > habababu hababami hababaoh habababee > would become something like... > > hababa bu mi oh bee

See, the thing is, you still have the problem that the listener must correctly discern the first word from a whole bunch of other very similar ones in order to derive the correct context. One syllable off, and he gets a totally different meaning from the following words. My point is, a single syllable (or morpheme) is too small to bear the weight of determining a category in the taxonomy. It works in theory, but it just doesn't work in practice. Human communication is inherently lossy; that's why natural languages have built-in redundancy. The redundancy is there for a reason! Now I'm not saying you have to make your lang redundant, but the point is that you need more than just a single syllable to carry a particular meaning. One syllable may be enough to make a distinction in isolation, but when everything else around it also makes such distinctions, the brain just overloads.

> Maybe if hababa means artificially grown plants, bu mi and oh would > refer to 3 hydroponics techniques, but if hababa means classical music > scores, then bu mi and oh might be dynamic variations.

That's possible, though I'm not sure if you want to push it that far. There's a limit beyond which things will start to break down.

> > 2) Words that refer to similar things in the same context preferably > > should be as different as possible. > > ...in the same context though. Where does the same context stop and > the different words start?

I'm not talking about lexical context, BTW. I'm talking about the context that exists in the mind of the speaker and the listener. Obviously, these two aren't the same thing, but they should share a common subset, otherwise communication wouldn't be possible to begin with.

> I'm proposing to do it at the syllable level, though that might not be > workable. I think ultimately what you're saying is I need a > (relatively) random distribution to a certain threshold of syllables, > after which I should use a taxonomic hierarchy.

You're still thinking in terms of a direct mapping between taxonomical categories and syllables (morphemes). It doesn't have to be so.

> So like the last 2 syllables in a category of words is random so they > don't resemble each other, but the first 2 are taxonomic?

The problem isn't whether the last n syllables are "random" or not; the problem is that mapping taxonomy to syllables directly overloads the lexical structure of the word. You want to encode the taxonomy in a way that doesn't overload every single syllable in the word; otherwise mishearing a single vowel causes the conversation to fly off into outer space. One characteristic of natlangs is that it degrades gracefully, and so should a conlang aspiring to be practical.

> > 3) Words that refer to different things in different contexts don't > > have to be very different from each other. Our brains can easily > > tell from context which meaning is intended, so there's no need > > to split hairs in this area. > > Exactly, so hamumamupo could mean something totally different than > bamumamupo, as long as we don't talk about both wildly diverse > concepts at the same time.

You'll still have to be careful, though, that the distinction between them doesn't become the basis for establishing context, otherwise you run into the same problem. E.g., if they are used at the beginning of a conversation, and the meaning of the rest of the conversation depends on which word is used, then you can completely misunderstand the conversation if you mishear a single syllable at the beginning. Not good, considering most real-life conversations involve less important filler at the beginning so that the listener has time to "tune in" before the fine distinctions come.

> > you just need to be creative about how exactly you represent the > > taxonomic structure. My advice is, a simple mapping from taxonomic > > structure to syllables is impractical. > > I haven't given up yet on the idea that one can construct a rational > mapping of vocabulary.

A rational mapping does not require a direct mapping from taxonomy to syllable.

> "Be creative" is advise to be heuristic,

No, it's an advice to search for alternative ways of mapping taxonomy to lexical structure than the per-syllable approach, which suffers from many flaws.

> and without a context to draw samples from I can't really tell > heuristically whether a vocabulary word should be one way or another. > Since I cannot follow your advice, I was trying to deduce a system > that would make more sense and give me some footing from which to be > creative. You can't make a painting with finished, dried paint, I > agree, but you also can't make a painting with all the paint mixed > together.

Try studying natural language and observe how it works, and think about why it's the way it is. Don't be too quick to write off a certain feature as a flaw; many perceived flaws in natlangs are actually only superficial. For example, ambiguity is largely perceived as a bad thing, but if one would only consider why it got there in the first place, one would gain the insight into the fact that the brain can deal with it via contexts. Or take the example of redundancy: one may think it's useless and one should optimize it away, but if you examine natlangs carefully, you'll discover that they have a very good balance between being overly redundant and therefore cumbersome, and being so austere that a little background noise would make it completely incomprehensible. Or the fact that the most frequent words are the most irregular; and why our brains have no problem with it (after we first master the language, of course). Irregular words tend to be more memorable once you've learned them, for the simple reason that being inherently a pattern matcher, our brain works best at telling the difference when the difference is overt (such as an unexpected verb form in an irregular verb, or a different word where a more regular one would be expected). Per-syllable differences are too minute and can easily become confusing, unless accompanied by other differences that indicates the same change in semantics (redundancy), in which case it can serve to reinforce the difference. Of course, the language cannot be completely irregular, since then it would be impractical to learn (and doing so also makes irregularity regular, thus defeating the overtness). So you have to draw the line between "frequent" and "infrequent", or "regular" and "irregular" somewhere---and natlangs do this quite well, for the simple reason that thousands or millions of people have explored the possibilities (consciously or not), and have settled on the best ones that meets their needs. Take, for example, irregular verbs and regular verbs in your Standard Average European language---they are mostly quite well distributed in terms of which verbs are used more often. Not perfect, mind you, but a very good balance nevertheless---this is where a constructed language could improve on. T -- Those who have not appreciated the beauty of language are not qualified to bemoan its flaws.