Re: Unknown Language Identifier!

From:	dirk elzinga <dirk.elzinga@...>
Date:	Monday, January 29, 2001, 16:23
|< < Post > >| << List/Tree >> January 2001 Index
On Sun, 28 Jan 2001, Padraic Brown wrote:

> Try this out on your conlangs:
>
> http://epsilon3.georgetown.edu/~cball/languageid/
Hmmm. Seems orthography is very important. I used a short Tepa
text and got:

Oromo:       0.0450
Czech:       0.0381
AngloSaxon:  0.0314
Somali:      0.0306

I fiddled a bit with the orthography, substituting <q> with <g>
(velar nasal), <y> with <j> (palatal glide), and <e> with <y>
(high central unrounded vowel). I ran it through again and got:

Oromo:    0.0480
Somali:   0.0384
Hausa:    0.0314
Klingon:  0.0278

Oromo is still on top, but of the remaining three only Somali
showed up again.

I reverted to the original orthography, and used a larger sample
and got:

Czech:          0.0425
Swahili:        0.0418
Lithuanian:     0.0341
SerboCroatian:  0.0303

Comparing with the original short sample, the longer sample
maintains some similarity to Czech (according to the software,
anyway), but the other three languages are replaced.

I also ran Shemspreg through (Shemspreg is my PIE take-off) and
got:

Hungarian:  0.0541
French:     0.0437
Manx:       0.0362
Latin:      0.0333

Three out of four IE lgs, but Hungarian gets top billing. Nice.

Then, just out of curiosity, I ran through a Shoshoni story. My
first impression of Shoshoni written in the official orthography
was of Finnish--both have a limited consonantal inventory and
lots of geminates. This is what I got:

Hausa:    0.0810
Swahili:  0.0717
Finnish:  0.0683
Kutchin:  0.0673

Finnish is in there, but not on top.

I have to agree with Matt: this doesn't seem very useful to
someone who is genuinely interested in the proper identification
of a text (unless that text happens to be in a language included
in the database).

Also the fact that the orthography can muck things up so easily
is disappointing (though not surprising).

Dirk

--
Dirk Elzinga                          dirk.elzinga@m.cc.utah.edu

"The strong craving for a simple formula
has been the undoing of linguists."               - Edward Sapir
|< < Post > >| << List/Tree >> January 2001 Index