Re: Taxonomic Vocabulary
From: | H. S. Teoh <hsteoh@...> |
Date: | Friday, September 1, 2006, 0:58 |
On Thu, Aug 31, 2006 at 03:48:13PM -0700, Tasci wrote:
> Thank you for your prompt reply! You are very wise, so anything I omit
> you can assume I agree with.
>
> On Thu, 31 Aug 2006 15:23:56 -0700
> "H. S. Teoh" <hsteoh@...> wrote:
>
> > Now, a computer would have absolutely no trouble picking out the
> > right words, but the human brain doesn't work that way.
>
> Depending on how you program the computer of course. If it strictly
> matches words yes, but in computer terms what you're describing is
> called a hash collision, as far as I know.
Well, that assumes the brain uses a hash function for word recognition.
:-) AFAIK, our brain is more like a pattern-matcher whereas a computer
deals with precise things---that's the difference I was driving at. Now
of course, it's possible to write a program that operates based on
pattern matching instead of exact comparison, but that's besides the
point I was trying to make.
> > You're assuming that roots must necessarily be complete general.
>
> How about my idea to make the roots complete specific, and
> generalizing from there? That's got problems of its own, but it might
> be something to consider during the designing of the real vocabulary.
I have to admit I've not experimented that much in that area before, so
you'll have to explore it and find out.
> > 1) Frequency of usage is more important than beauty of internal
> > structure. You should cater to the fact that the most frequently used
> > words should be most economical, even if the concepts themselves are
> > very complicated and require a lot of specification in a taxonomic
> > system.
>
> Perhaps it would be interesting to try a vocabulary that starts with
> the most frequent words, and then has specifiers for related words,
> much in the way our brain works by taking familiar concepts and
> following connections to related concepts.
That sounds workable.
[...]
> > What exactly constitutes 'frequent' depends on what your target
> > audience is. The language of a farmer is very different from the
> > language of an academic researcher, even though they share a
> > similar subset that they can mutually understand each other
> > (e.g., they both speak English---even though the kind of English
> > the researcher speaks uses a lot of words that the farmer doesn't
> > use from day to day, and vice versa).
>
> Now see, that's just the very reason that a taxonomic vocabulary would
> make total sense. The scientist words would have a certain prefix (or
> suffix) and beneath that would use the same combinations as the
> farmer, who would have a different 'farmer words' prefix. Maybe it
> wouldn't be subdivided by scientist and farmer, but it would allow us
> to remember less, once we know the context of our vocabulary. Maybe
> it would even work to establish the context of the vocabulary, then
> omit the assumed prefix until you're done with that context. Not sure
> how to conceptualize it, but something like...
>
> habababu hababami hababaoh habababee
> would become something like...
>
> hababa bu mi oh bee
See, the thing is, you still have the problem that the listener must
correctly discern the first word from a whole bunch of other very
similar ones in order to derive the correct context. One syllable off,
and he gets a totally different meaning from the following words.
My point is, a single syllable (or morpheme) is too small to bear the
weight of determining a category in the taxonomy. It works in theory,
but it just doesn't work in practice. Human communication is inherently
lossy; that's why natural languages have built-in redundancy. The
redundancy is there for a reason!
Now I'm not saying you have to make your lang redundant, but the point
is that you need more than just a single syllable to carry a particular
meaning. One syllable may be enough to make a distinction in isolation,
but when everything else around it also makes such distinctions, the
brain just overloads.
> Maybe if hababa means artificially grown plants, bu mi and oh would
> refer to 3 hydroponics techniques, but if hababa means classical music
> scores, then bu mi and oh might be dynamic variations.
That's possible, though I'm not sure if you want to push it that far.
There's a limit beyond which things will start to break down.
> > 2) Words that refer to similar things in the same context preferably
> > should be as different as possible.
>
> ...in the same context though. Where does the same context stop and
> the different words start?
I'm not talking about lexical context, BTW. I'm talking about the
context that exists in the mind of the speaker and the listener.
Obviously, these two aren't the same thing, but they should share a
common subset, otherwise communication wouldn't be possible to begin
with.
> I'm proposing to do it at the syllable level, though that might not be
> workable. I think ultimately what you're saying is I need a
> (relatively) random distribution to a certain threshold of syllables,
> after which I should use a taxonomic hierarchy.
You're still thinking in terms of a direct mapping between taxonomical
categories and syllables (morphemes). It doesn't have to be so.
> So like the last 2 syllables in a category of words is random so they
> don't resemble each other, but the first 2 are taxonomic?
The problem isn't whether the last n syllables are "random" or not; the
problem is that mapping taxonomy to syllables directly overloads the
lexical structure of the word. You want to encode the taxonomy in a way
that doesn't overload every single syllable in the word; otherwise
mishearing a single vowel causes the conversation to fly off into outer
space. One characteristic of natlangs is that it degrades gracefully,
and so should a conlang aspiring to be practical.
> > 3) Words that refer to different things in different contexts don't
> > have to be very different from each other. Our brains can easily
> > tell from context which meaning is intended, so there's no need
> > to split hairs in this area.
>
> Exactly, so hamumamupo could mean something totally different than
> bamumamupo, as long as we don't talk about both wildly diverse
> concepts at the same time.
You'll still have to be careful, though, that the distinction between
them doesn't become the basis for establishing context, otherwise you
run into the same problem. E.g., if they are used at the beginning of a
conversation, and the meaning of the rest of the conversation depends on
which word is used, then you can completely misunderstand the
conversation if you mishear a single syllable at the beginning. Not
good, considering most real-life conversations involve less important
filler at the beginning so that the listener has time to "tune in"
before the fine distinctions come.
> > you just need to be creative about how exactly you represent the
> > taxonomic structure. My advice is, a simple mapping from taxonomic
> > structure to syllables is impractical.
>
> I haven't given up yet on the idea that one can construct a rational
> mapping of vocabulary.
A rational mapping does not require a direct mapping from taxonomy to
syllable.
> "Be creative" is advise to be heuristic,
No, it's an advice to search for alternative ways of mapping taxonomy to
lexical structure than the per-syllable approach, which suffers from
many flaws.
> and without a context to draw samples from I can't really tell
> heuristically whether a vocabulary word should be one way or another.
> Since I cannot follow your advice, I was trying to deduce a system
> that would make more sense and give me some footing from which to be
> creative. You can't make a painting with finished, dried paint, I
> agree, but you also can't make a painting with all the paint mixed
> together.
Try studying natural language and observe how it works, and think about
why it's the way it is. Don't be too quick to write off a certain
feature as a flaw; many perceived flaws in natlangs are actually only
superficial. For example, ambiguity is largely perceived as a bad thing,
but if one would only consider why it got there in the first place, one
would gain the insight into the fact that the brain can deal with it via
contexts.
Or take the example of redundancy: one may think it's useless and one
should optimize it away, but if you examine natlangs carefully, you'll
discover that they have a very good balance between being overly
redundant and therefore cumbersome, and being so austere that a little
background noise would make it completely incomprehensible.
Or the fact that the most frequent words are the most irregular; and why
our brains have no problem with it (after we first master the language,
of course). Irregular words tend to be more memorable once you've
learned them, for the simple reason that being inherently a pattern
matcher, our brain works best at telling the difference when the
difference is overt (such as an unexpected verb form in an irregular
verb, or a different word where a more regular one would be expected).
Per-syllable differences are too minute and can easily become confusing,
unless accompanied by other differences that indicates the same change
in semantics (redundancy), in which case it can serve to reinforce the
difference. Of course, the language cannot be completely irregular,
since then it would be impractical to learn (and doing so also makes
irregularity regular, thus defeating the overtness). So you have to draw
the line between "frequent" and "infrequent", or "regular" and
"irregular" somewhere---and natlangs do this quite well, for the simple
reason that thousands or millions of people have explored the
possibilities (consciously or not), and have settled on the best ones
that meets their needs. Take, for example, irregular verbs and regular
verbs in your Standard Average European language---they are mostly quite
well distributed in terms of which verbs are used more often. Not
perfect, mind you, but a very good balance nevertheless---this is where
a constructed language could improve on.
T
--
Those who have not appreciated the beauty of language are not qualified
to bemoan its flaws.