Re: Phonologically redundant vocabulary
From: | Jim Henry <jimhenry1973@...> |
Date: | Thursday, April 13, 2006, 14:22 |
On 4/13/06, Henrik Theiling <theiling@...> wrote:
> Jim Henry writes:
> > A while ago there was a thread about using phonologically
> > redundant vocabulary (no minimal pairs). I've been working
> > (intermittently) on methods and scripts to generate lists
> > of such words. I started writing something which turned
> > out to be a bit long for a listgroup post, so here it is as
> > an article on my website:
> >
> >
http://www.pobox.com/~jimhenry/conlang/redundancy.htm
> Nice stuff. :-)
>
> At the time of the thread, I was also thinking of a engelang with a
> self-regregating morphology plus redundant word building. These
> are nice tools for implementing a beast like that.
Cool. I'm sure our languages will
> Thanks for sharing!
I'll be updating the article, and most likely the scripts, again in
a few days -- there are points that I forgot to cover re: the
format of the input files, an algorithm I haven't yet implemented
for efficiently searching for strings with at least three (or more)
points of difference, the need of stricter redundancy criteria for
longer words, etc. (For instance, the basic requirement to
have no minimal pairs can result in sets of 3-syllable CV(n)CV(n)CV(n)
words that include very similar subsets like:
nokunpun
jakunpun
kikunpun
tukunpun
lunkunpun
sonkunpun
sipunpun
nanpunpun
jenpunpun
unpunpun
linpunpun
All differ by two phonemes, but all share two syllables in common.
John E Clifford, in offlist correspondence, has suggested that
maybe one should require words to have no entire syllables
in common. I'm not yet sure how to restate that in terms
of a minimum number of characters different; maybe for strings
of 9 characters, a minimum of 6 characters different would
be equivalent to a minimum of 2 characters different for strings
only 3 characters long. I have a vague idea how to more
efficiently search for strings with 3 or more characters different,
but I suspect it will be an exponential slowdown from the
2-character search script. Basically, if for a 2-character redundancy
search you block off all the cells in the same row, column and stack
as the cell representing a word you've picked, and then move diagonally
in the same plane to look for another open space -- for a 3-character
redundancy search you would block off all cells in all the _planes_
that intersect at the chosen cell, and then move diagonally
(meta-diagonally?) into another plane in the same cube...
Writing the code for that search in an arbitrary number of dimensions
will be hairy, and writing code for searching for an arbitrary minimum
number of characters different will be even worse.
--
Jim Henry
http://www.pobox.com/~jimhenry
Reply