Letter stats

From:H. S. Teoh <hsteoh@...>
Date:Wednesday, July 31, 2002, 19:20
0so' Ta'l3n 3jomiu'.

So, I've been building my own set of tools for working with Ebisedian,
esp. in dealing with the ugly but unfortunately necessary ASCII
orthography, and in helping with various lexicon maintenance stuff.  In
particular, I've built a tool called 'lextool', which does things like
check the Ebisedian lexicon for proper alphabetical order, compute stats,
etc.. (Alphabetic ordering is actually quite tricky in Ebisedian, in spite
of the fact it is straightforward on the surface. And when lexicon entries
reside in 39 OrTeX files, it is no trivial task to verify something as
fundamental as this.)

Anyway, I've just finished a feature that analyses the lexicon for the
frequency of letter occurrences. Here are some interesting results:

The most common initial letter is _k_ [k], with 31 entries under it. A
close runner-up is _m_ [m], at 28 entries. It is interesting to note that
_m_ is the only labial to make it that high in the list; 3rd and 4th place
on the list are _g_ [g] with 18 entries and _K_ [k_h] with 20 entries,
respectively. (2nd place is _i_ [?i], but that's a bit of a fluke since
most of it comes from the nominal form of numerals.)

So it appears that Ebisedian is quite a velaral language.

However, the next analysis shows otherwise. The most common consonant,
both word-initial and intra-word, is _r_ [r`], with a grand total of 108
occurrences, although only 13 words begin with _r_. In 2nd place, we have,
again, _m_, at 65 occurrences. Third place is taken by _t_ [t], with 56
occurrences, and then in 4th place we finally have _k_, with a mere 52

So apparently, Ebisedian is rather velar-initial, but quite rhotic-flappy
inside. :-P

Now for the bottom of the list: the prize for least common initial letter
goes to _w_ [u"], with 0 entries to its name. Penultimate is _dh_ [D],
with only 1 entry, followed by antepenultimate _e_ [?&] with 2 entries.
(This last is not quite true though, because [?&] occurs very frequently
as the masculine singular proper name prefix; but since the lexicon does
not list proper names, _e_ lost most of its clout.)

The least common consonant is _dh_. Apparently that 1 entry is the only
occurrence of _dh_ in the whole language, so far. :-/ The next least
common is _C_ [tS_h], with 6 occurrences in total.

I haven't counted intra-word vowels in this analysis, but gut feeling says
_w_ will win the Most Neglected Award, and _i_ or _3_ [@\] will probably
be near the top. (_i_ has a free ride here: almost all nouns are listed by
their locative forms in the lexicon, and _i_ is the characteristic vowel
of the locative case.)

And thus are the statistics of the Ebisedian lexicon, which is 343-entry
strong as of this writing.


In theory, there is no difference between theory and practice.