Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: THEORY: Unknown Language Confuser!

From:J Matthew Pearson <pearson@...>
Date:Tuesday, January 30, 2001, 4:23
The Gray Wizard wrote:

> > From: Lars Henrik Mathiesen > > > > People, will you PLEASE read just a little of the explanatory text on that > web > > site? > [snipped] > > Lars, chill out! I don't believe any of us were taking this exercise as > seriously as your post implies. We are all well aware that the results were > meaningless, but none the less fun to play with.
Hear hear, David! I suppose I should say something at this point, since I'm the one who's guilty of diverting this thread into serious territory by putting on my linguist's hat. I happen to find the website extremely interesting. The only part of it that I was criticizing was the following claim (paraphrased, emphasis added): "If the sample size of the input language is large enough, and if the text is typical, and if the highest score is low (say, below 0.1) and if the next highest score is significantly lower, then the language which got the highest score IS LIKELY TO BE RELATED TO THE INPUT LANGUAGE." I don't pretend to understand how the program works, but based on Dirk's experiments, it appears that it bases its results on orthographic similarities (frequency and combination of letters, etc.) between the input text and the various comparison texts. Since orthographic similarities are a poor indicator of genetic relatedness, I fail to see how the above claim can be supported. In short, Lars, I'm not criticising the fact that the program produces false positives; I'm criticising how the author of the website chooses to interpret non-false positives. That objection aside, I agree with John Cowan that this program is a potentially useful tool for identifying unknown languages, given a sufficiently large sample of comparison texts. And, as David says, it's definitely fun to play with! Matt.