Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Efficiency/Spatial Compactness

From:<morphemeaddict@...>
Date:Saturday, July 21, 2007, 19:06
In a message dated 7/21/2007 1:28:29 PM Central Daylight Time,
jimhenry1973@GMAIL.COM writes:


> On 7/21/07, MorphemeAddict@wmconnect.com <MorphemeAddict@...> > wrote: > > In a message dated 7/19/2007 3:26:43 PM Central Daylight Time, > > joerg_rhiemeier@WEB.DE writes: > > > > The basic idea, as far as I understand it, is to build a language, > > > translate texts into it, measure the token frequencies of the > > > morphemes, and then relex is using the shortest morphs for the > > > most frequent morphemes. > > > I started a project like this just a few days ago. The original language > is > > Esperanto (translated from German). The text is "La Karavano" by Wilhelm > > Hauff (found at the Gutenberg Project), over 35,000 words long. > > I'm in the process of splitting all the words into their morphemes right > now. > > Then I'll make a frequency list of the morphemes, and, finally, I'll > assign > > the Esperanto morphemes to new ones by their frequency (and probably > morpheme > > type, too). > > Cool. What program are you using to measure the frequency of > tokens?
I do everything with MS Excel, occasionally also using MS Word. Does it measure frequency of phrases as well? Maybe. I hadn't thought about that. That would be harder to do with what I normally use. Maybe if I simply counted the number of times occurs using Word, using the find function. That's about the only way I can see to do it.
> You can get such a script (in Perl) from my site: > > http://www.pobox.com/~jimhenry/conlang/frequencies.pl >
My programming skills are a couple of decades out of date. I have no idea how to use that program.
> (I have a newer, better version than what is on my website, > but I can't FTP-upload it from the hospital wireless network. > I'll do that sometime after I get out. Meanwhile I could email > it to you if you want it.) > > If you have something that will measure the frequency of > wildcard phrases (e.g. how often two words occur with > any word between them, or with any two words, or...) > let me know. >
I'm not sure how to do this either. I've parsed over half of the words so far. Actually splitting the words apart will be done in Word. Then the count will be done with Excel. stevo </HTML>

Reply

Ph. D. <phil@...>