Re: THEORY: Unknown Language Confuser!
From: | J Matthew Pearson <pearson@...> |
Date: | Tuesday, January 30, 2001, 4:23 |
The Gray Wizard wrote:
> > From: Lars Henrik Mathiesen
> >
> > People, will you PLEASE read just a little of the explanatory text on that
> web
> > site?
> [snipped]
>
> Lars, chill out! I don't believe any of us were taking this exercise as
> seriously as your post implies. We are all well aware that the results were
> meaningless, but none the less fun to play with.
Hear hear, David!
I suppose I should say something at this point, since I'm the one who's guilty
of diverting this thread into serious territory by putting on my linguist's
hat. I happen to find the website extremely interesting. The only part of it
that I was criticizing was the following claim (paraphrased, emphasis added):
"If the sample size of the input language is large enough, and if the text is
typical, and if the highest score is low (say, below 0.1) and if the next
highest score is significantly lower, then the language which got the highest
score IS LIKELY TO BE RELATED TO THE INPUT LANGUAGE."
I don't pretend to understand how the program works, but based on Dirk's
experiments, it appears that it bases its results on orthographic similarities
(frequency and combination of letters, etc.) between the input text and the
various comparison texts. Since orthographic similarities are a poor indicator
of genetic relatedness, I fail to see how the above claim can be supported. In
short, Lars, I'm not criticising the fact that the program produces false
positives; I'm criticising how the author of the website chooses to interpret
non-false positives.
That objection aside, I agree with John Cowan that this program is a potentially
useful tool for identifying unknown languages, given a sufficiently large sample
of comparison texts. And, as David says, it's definitely fun to play with!
Matt.