Tool: Meta-Lexicon in Lisp (a bit lengthy)

From:	Henrik Theiling <theiling@...>
Date:	Sunday, April 22, 2001, 21:16
|< < Post > >| << List/Tree >> April 2001 Index
Hi!

I'm currently writing a neat tool for making lexicons for conlangs.
If anyone is interesting, I'm going to document the library
appropriately and eventually put it on my homepage for download.

This is what it is:

I had the problem that when I wanted to change the phonology while I
already had a lexicon.  Of course it is a lot of work to change
everything.  Therefore, I wrote a meta-lexicon in CommonLisp.  It
works as follows:

Each stem of the language is defined by some numbers only.  Do define
a stem by phonemes, a sequence of numbers between 0 and 999 are used.
These I called meta-phonemes.  Short stems have few meta-phonemes,
long stems have many meta-phonemes.

The idea is to partition the range of 0..999 into as many pieces as
there are real phonemes.  Each position in the sequence may use a
different mapping.

Additionally, each stem has a category (for my language, p for
particles and n for normal stems).  And to have some variation,
another number selects a stem variation ration 0..999.

Some examples.  I want to define a CVC stem, and decide that the first
meta-phoneme is the first consonant, the second is the vowel and the
last is the last consonant.  Quite a trivial mapping so to say.  A
mapping between meta-phoneme and real phoneme is given by a list:

E.g. the first consonant may be this:

(defvar *cons-a*
    '(
        ("t"   1/8)
        ("s"   1/8)
        ("n"   1/8)
        ("l"   1/8)
        ("k"   1/8)
        ("x"   1/8)
        ("h"   1/8)
        ("")
    )
)

The 1/8 is the weight the phoneme has in the range between 0..999.  In
this case, "t" would be from 0..124, "s" would be from 125..249, etc.

The same I do for *vowel*, the vowels, and *cons-z*, the final
consonant.

Of course, you may define sequences of phonemes as one meta-phoneme if
you have constraints on phoneme sequences.

Now I define that for three phonemes, there is only one variant and
the sequence is the following:

      ....
            ( ; 3 phonemes, only this variant:
                ((,*cons-a* ,*vowel* ,*cons-z*))
            )
      ....

Doing this for all variations, I get a description of how stems may
look like.

For existing stems, the meta-lexicon library decomposes words given as
strings into meta-phonemes.  By this, you can translate a lexicon into
the meta-lexicon form.  E.g. I write

    (decompose-word *tyl-sjok* "kul")

Here, *tyl-sjok* is the current phonology description.  By this,
the word "kul" is decomposed.  All possibilities are returned,
in this case, only one:

    ((N 0 499 152 426))

For explanation:
    N means its a normal stem
    0 is the stem variation.  We only have one, so it is always 0.

all other numbers are meta-phonemes.

Because maybe later, I want to split a phoneme into two, I can tell
the decomposer to add a bit of random within the ranges that are
currently possible by the description (e.g. in this case it may add
0..124 for the first consonant):

    (decompose-word *tyl-sjok* "kul" :add-random 1)

One answer might be:

    ((N 152 517 164 553))

To get a string-representation from a meta-stem, I can write:

    (compose-word *tyl-sjok* '(N 152 517 164 553))

And the answer is (if I don't change the description):

    "kul"

Now I can change the phonology description to split the former k into
g and k maybe.  Some lexicon entries change, some don't.  I can adjust
the amount of change.  I can change the vowels, everything.  However,
because of the structure of the meta-stems, I have some information
about the stem: approximate length, variant, stem class.  This makes
it very flexible to adjust things.

The whole lexicon works with numbers only, too.  It can handle stems
and composites (but because my language is isolating, I do not have a
mechanism of complex changes at the boundaries, so this has to be
programmed).  The first stem is defined with those meta-stem
descriptions and gets a unique number as an identifier.  Composites
use the identifiers.

If anyone wants more details, let me know.  It's not finished yet, but
will be publishable soon.

*Henrik
|< < Post > >| << List/Tree >> April 2001 Index