Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unknown Language Identifier!

From:dirk elzinga <dirk.elzinga@...>
Date:Monday, January 29, 2001, 16:23
On Sun, 28 Jan 2001, Padraic Brown wrote:

> Try this out on your conlangs: > > http://epsilon3.georgetown.edu/~cball/languageid/
Hmmm. Seems orthography is very important. I used a short Tepa text and got: Oromo: 0.0450 Czech: 0.0381 AngloSaxon: 0.0314 Somali: 0.0306 I fiddled a bit with the orthography, substituting <q> with <g> (velar nasal), <y> with <j> (palatal glide), and <e> with <y> (high central unrounded vowel). I ran it through again and got: Oromo: 0.0480 Somali: 0.0384 Hausa: 0.0314 Klingon: 0.0278 Oromo is still on top, but of the remaining three only Somali showed up again. I reverted to the original orthography, and used a larger sample and got: Czech: 0.0425 Swahili: 0.0418 Lithuanian: 0.0341 SerboCroatian: 0.0303 Comparing with the original short sample, the longer sample maintains some similarity to Czech (according to the software, anyway), but the other three languages are replaced. I also ran Shemspreg through (Shemspreg is my PIE take-off) and got: Hungarian: 0.0541 French: 0.0437 Manx: 0.0362 Latin: 0.0333 Three out of four IE lgs, but Hungarian gets top billing. Nice. Then, just out of curiosity, I ran through a Shoshoni story. My first impression of Shoshoni written in the official orthography was of Finnish--both have a limited consonantal inventory and lots of geminates. This is what I got: Hausa: 0.0810 Swahili: 0.0717 Finnish: 0.0683 Kutchin: 0.0673 Finnish is in there, but not on top. I have to agree with Matt: this doesn't seem very useful to someone who is genuinely interested in the proper identification of a text (unless that text happens to be in a language included in the database). Also the fact that the orthography can muck things up so easily is disappointing (though not surprising). Dirk -- Dirk Elzinga dirk.elzinga@m.cc.utah.edu "The strong craving for a simple formula has been the undoing of linguists." - Edward Sapir