Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unknown Language Identifier!

From:Dan Sulani <dnsulani@...>
Date:Tuesday, January 30, 2001, 7:23
On 29 Jan, Dirk Elzinga wrote:


>I have to agree with Matt: this doesn't seem very useful to >someone who is genuinely interested in the proper identification >of a text (unless that text happens to be in a language included >in the database). > >Also the fact that the orthography can muck things up so easily >is disappointing (though not surprising).
Not taking the whole thing too seriously, but wanting to follow up on the idea of orthography, just for the fun of it I put some rap lyrics through their paces. My "best" result was with the following: [Snoop] Yeah, ha ha, Snoop Dogg [W.C.] Dub C.. heh, yeah [Snoop] All up in here, bay-bay.. yeah [W.C.] Uh-huh [Snoop] Straight G thang, yeah That got me: AngloSaxon 0.0266 followed by: Choctaw 0.0166 Klingon 0.0163 (Klingon?! ;-) ) Sotho 0.0150 When I tried the complete lyrics of songs (including this one) by a number of rappers, I kept getting English at around the 31 percent level, always followed by Scots at around 24 percent, AngloSaxon at around 12 percent and Icelandic at around 1 percent. Dan Sulani -------------------------------------------------------------------- likehsna rtem zuv tikuhnuh auag inuvuz vaka'a. A word is an awesome thing.