Shorthandization of gjâ-zym-byn script

Jim Henry
Wednesday, March 28, 2007
Recently I've transcribed some pages from my handwritten journal
(using a random number generator to pick which pages, to avoid
unconscious bias toward more interesting or less private entries),
tripling the size of the gzb electronic corpus, and done a frequency
analysis of the most common words and morphemes.  Using the results,
I've started crafting simple logograms for the most common morphemes
and words - so far, the active and stative verb suffixes, the most
common "and" conjunction {kinq}, and two of the most common
postpositions, {mi-i} (topic) and {hy-i}
(patient).  My plan is to go on introducing new logograms or
abbreviation symbols about once a week or so, working my way down the
frequency table and probably continuing to transcribe more pages from
the journal to ensure the corpus is large and representative enough.

I recently modified my frequency analysis by adding a step that
calculates the length of a word, multiplies it by the number of times
it occurs in the corpus, and sorts by the product.  That brings some
longish words higher up in the table, but doesn't make anything jump
out as something I should have made a logogram for already.  (The
length algorithm is just Perl's length() function on a string that's
had ASCII digraphs converted into single letters.  I need to write
something better that takes the complexity of the script's written
letters into account -- for instance, it could parse a word letter by
letter and look up the number of penstrokes each one requires in a
hash table.)

Whether and to what extent I will apply this kind of frequency-based
transformation to the language itself, as opposed to its written
representation, is another question.  As I posted here before, I'm
going to do exactly that with my other engelang, säb zjed'a -- relex
it several times based on a frequency analysis as its corpus grows --,
but I've been using gzb for long enough and have become fluent enough
in it that I don't think I'll be making fundamental changes in its
grammar and lexicon for the sake of greater conciseness.  Adding a few
monosyllabic allomorphs for common root words of >=2 syllables is one
thing; I've done that before based on my intuitive sense of which
words are more common, and I may readily do so in the future now that
I have a quantifiable sense of how common they really are.  (Still at
a slow, measured rate, giving myself time to thoroughly learn one new
allomorph of an old word before adding another one.)  But I'm not
going to add suppletive forms to substitute for the most common roots
plus their most commonly associated suffixes, which would, it seems to
me, alter the grammar in a way that might set me back in my
acquisition of the language.

Jim Henry