> I thought I had read a web page addressing word length as a function of
> word frequency before, but after a half-hour of searching Google I gave
> up and did a quick analysis of this English corpus in Excel:
>
http://www.comp.lancs.ac.uk/ucrel/bncfreq/lists/2_3_writtenspoken.txt
>
> Length of word - Average frequency of words with this length
> 1 - 1835.5
> 2 - 1790.7
> 3 - 900.2
> 4 - 211.3
> 5 - 110.7
> 6 - 78.6
> 7 - 71.9
> 8 - 63.1
> 9 - 59.5
> 10 - 53.6
> 11 - 49.9
> 12 - 47.1
> 13 - 48.7
> 14 - 36.4
> 15 - 33.0
> 16 - 30.0
>
> I haven't scrubbed the corpus (and it looks like it could use it), but
> this quick and dirty analysis was all I needed for my conlanging
> activities of the moment, and proved my hypothesis correct. The more
> frequent words in my conlang should be shorter than less frequent
> words,
> but frequency declines more gradually than I anticipated for words of 7
> or more letters.
>
> Has anyone seen a more rigorous analysis?
>
> I had toyed with converting the words to phonetic representations but
> decided it wasn't worth my time. Obviously, the number of phonemes in
> a
> word is a stronger function of word frequency than the length of the
> English spelling of the word, but I didn't feel like using SOUNDEX or
> Zompist.com's English spelling algorithm (56 rules! --
>
http://www.zompist.com/spell.html) to come up with approximations of
> the
> phonetic length.
>
> Anyone inspired to do a more statistically thorough analysis?
>
> Best regards,
>
> Jeffrey
>
>
--
Dirk Elzinga
Dirk_Elzinga@byu.edu
"I believe that phonology is superior to music. It is more variable and
its pecuniary possibilities are far greater." - Erik Satie