Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Efficiency/Spatial Compactness

From:And Rosta <and.rosta@...>
Date:Saturday, July 21, 2007, 19:49
Jim Henry, On 21/07/2007 19:25:
> Cool. What program are you using to measure the frequency of > tokens? Does it measure frequency of phrases as well? > You can get such a script (in Perl) from my site: > > http://www.pobox.com/~jimhenry/conlang/frequencies.pl > > (I have a newer, better version than what is on my website, > but I can't FTP-upload it from the hospital wireless network. > I'll do that sometime after I get out. Meanwhile I could email > it to you if you want it.) > > If you have something that will measure the frequency of > wildcard phrases (e.g. how often two words occur with > any word between them, or with any two words, or...) > let me know.
Ideally you'd derive your statistics not from strings of wordforms but from semanticosyntactic trees. Or both. E.g. you'd want to find the frequency of "give X food" (which might warrant a compressed form meaning "feed X"), regardless of the length of X. I say "ideally" because it'd mean an awful lot of work, for results that would be very interesting yet surely still distressingly distant from perfection. --And.