Re: Dublex (was: Washing-machine words (was: Futurese, Chinese,
From: | Jeffrey Henning <jeffrey@...> |
Date: | Monday, May 20, 2002, 3:12 |
And Rosta <a-rosta@...> comunu:
> How does the Dublex programme go about doing this? For example, are there
> different candidate sets of 400? -- E.g. you might start with 1000 and
> choose the 400 best. Or you might start with no upper number, but take
> words from English and replace them by compounds, recursively replacing
> the constituents of compounds by further compounds, until you end up
> with 400 'atomic' morphemes. And how is the usefulness of a word
> measured?
Good questions! Basically, I first developed the 400-root word list by
studying the Universal Language Dictionary (the most comprehensive short
wordlist around, IMO), the Lojban gismu, Basic English and Esperanto. I
added a few words that I wanted to make sure were included so that I could
describe the language in the language (e.g., 'nomin' and 'verb' for "noun"
and "verb"). The initial 400 was my *subjective* take on the 400 roots that
would be most productive.
While I have locked in the idea of using 400 roots,* I want the morpheme
list to evolve and improve over time. So I apply the concept of survival of
the fittest to the 400 morphemes. The weakest morphemes of the herd can be
killed off by new stronger morphemes. The strength of a morpheme equals
the number of two-compound words that can be formed from it. The strongest
compound would form 399 words with it as the modifying morpheme and 399
words with it as the base morpheme for a strength of 798. In practice the
current average root has a strength of 24 right now, meaning each root forms
24 two-word compounds (but the median strength is 14).
Here's how this evolution works in practice:
Suhvoclete Repfaba Sist
[Root Revision System.]
1. Choose 40 roots at random (10% of the roots).
2. Coin compound words from these using the root you want to deprecate and
the root you want to add.
3. Multiply the productivity of each by 10 to estimate the productivity with
all 400 roots.
4. If the new root wins, add all of its coined words. Re-coin all compounds
that use the old root.
I just did this for "door" vs. "noun", and came up with an estimated
productivity of 60 compounds for "door" vs. 10 for "noun". So "noun" will
be culled from the herd (the only compounds from it were for "grammar" and
"pronoun").
(*The number 400 itself was subjectively chosen, and one of the points of
the Dublex experiment is to generate some statistics on the effect of
morpheme count on word length.)
Best regards,
Jeffrey
http://jeffrey.henning.com
http://www.langmaker.com
Reply