Theiling Online    Sitemap    Conlang Mailing List HQ    Attic   

Re: Avoiding near-collisions in vocabulary coinage

From:Jim Henry <jimhenry1973@...>
Date:Wednesday, August 6, 2008, 19:14
On Wed, Aug 6, 2008 at 2:43 PM, Benct Philip Jonsson <bpj@...> wrote:

> I've had thoughts on creating a script for identifying > minimal pairs in an existing vocabulary but not come up > with anything better than sucessively replace every grapheme > of every word with \w and compare it with every other word > in the dictionary. Any ideas?
An adaptation of my findsimilar.pl script, so that it compares every word of the lexicon with every other word instead of comparing its command line argument to every word of the lexicon, would be better than nothing. It has a couple of flaws, though; it fails to find minimal pairs where one morpheme is one phoneme longer or shorter than another (e.g. /ka/ vs /kap/, /an/ vs. /tan/, /pef/ vs. /pwef/, etc.) and it turns up too many false positives, words of the same general pattern but where almost every individual phoneme is different. That's OK when I'm looking for one word at a time, but would be overwhelming when comparing every word to every other. A script to do this properly would need to know not only the orthography of the language involved, but enough about its phonotactics to identify slots where an optional phoneme is missing but could be added, or where a phoneme is present but could be omitted (to catch those pairs I mentioned above). Also, instead of generating one regex to find all similar words, it should probably identify each slot in the word and generate a regex for each slot. E.g, for input /kaf/ where the phonotactic rule is C(S)V(S)(C), you would use regexes like [kgx]af k[jrw]af k[aiu]f ka[jrw]f ka[fvp] I think that would identify all minimal pairs, assuming /k g x f v p b j r w a i u/ is our phoneme inventory. For a broader definition of "minimal pair" you would use [kgxfvpbjrw]af k[jrw]af k[aiu]f ka[jrw]f ka[kgxfvpbjrw] (Also, all those regexes should be wrapped in /^ .... $/, else you would get false positive substring maches like /pikokafitex/.) -- Jim Henry http://www.pobox.com/~jimhenry/