Theiling Online    Sitemap    Conlang Mailing List HQ    Attic   

Different Words with Large Common Substrings

From:Eldin Raigmore <eldin_raigmore@...>
Date:Monday, October 13, 2008, 19:41
Inspired by this:

|3 Alternatives |And Rosta's Livagian uses another method which, though not a self- |segregating morphology in the strict sense, partly serves the same |purpose with less restriction in the phonological shape of words. It |requires a full knowledge of the lexicon to parse unambiguously, however. |The key is that no actual morpheme must look like a prefix or suffix |substring of another actual morpheme. So, for instance, if in a |string "kesumalipe" you recognize "kesu" and "pe" as familiar morphemes, |you know that this must be "kesu" followed by "ma li" or "mali" followed |by "pe"; the fact that "kesu" is a real morpheme in a language meeting |this criterion means that there cannot be another morpheme "kesuma" |or "kesumali", and there can't be any morpheme like "lipe" or "malipe". |But if you have only learned the phonology of the language and don't know |much vocabulary yet, you can't deduce the morpheme boundaries from the |phonotactics of the word; you would have to start by |looking up "k" in the lexicon, then "ke", then "kes", until you |find "kesu"; then start looking for "m", "ma", etc. |Retrieved from " |segregating_morphology_methods" I have been considering rules roughly similar to: "No two distinct words can have a common initial substring which is longer than half the length of one of them and longer than one-third the lengths of the other; nor can any two distinct words have a common final substring which is longer than half as long as one and longer than one-third as long as the other." ("Length" may be measured in number of segments, or in number of morae, or in number of syllables, or in number of feet, depending.) The thing is, of course, this means that sets of words like: turn return turned returned cause problems. It also makes sets like blackbird blackboard blackguard blackhead blacktop redbird redboard redcap redhead be problematic. Maybe it should be that "no 'finite' or 'surface' or 'fully-inflected' word should be an initial nor final substring of any (other) _morpheme_" or "... of any (other) _root_"? (I include "other" because maybe the root form of the word can occur as a surface form.) That's a less widely-applicable, hence more permissive rule. For one thing, the substring has to be _all_ of one of the comparands. .......................................................................... I was also thinking; what if the substring had to be initial or final in only one of the comparands, but could be medial in the other? Something like "If any two words share a substring which is an initial or final substring of at least one of them, and also is (over) half as long as at least one of them and also (over) one-third as long as the other, then either one of the words is inflected or derived from the other, or both words are inflected or derived from the same baseword/wordbase/stem/root." This modification could still cause difficulties with some sets of compounds; say, "whiteboard" and "blackboard" and "blackguard". What sort of rule, similar to one or more of those I've mentioned so far, actually work for some natlang? _________________________________________________________________ In theory, there's no difference between theory and practice. In practice, there is.


Gary Shannon <fiziwig@...>