Re: Efficiency/Spatial Compactness
From: | <morphemeaddict@...> |
Date: | Saturday, July 21, 2007, 6:13 |
In a message dated 7/19/2007 3:26:43 PM Central Daylight Time,
joerg_rhiemeier@WEB.DE writes:
> The basic idea, as far as I understand it, is to build a language,
> translate texts into it, measure the token frequencies of the
> morphemes, and then relex is using the shortest morphs for the
> most frequent morphemes.
>
I started a project like this just a few days ago. The original language is
Esperanto (translated from German). The text is "La Karavano" by Wilhelm
Hauff (found at the Gutenberg Project), over 35,000 words long.
I'm in the process of splitting all the words into their morphemes right now.
Then I'll make a frequency list of the morphemes, and, finally, I'll assign
the Esperanto morphemes to new ones by their frequency (and probably morpheme
type, too).
I've already done the endings, which are all consonants. The other morphemes
(prefixes, roots, suffixes) will all end in vowels: V, CV, VCV, etc., adding
alternating C and V at the beginning. I expect a morpheme like
"administraci-" to shrink to a three or at most four-letter morpheme.
stevo </HTML>
Reply