Re: Unknown Language Identifier!

From:	Dan Sulani <dnsulani@...>
Date:	Tuesday, January 30, 2001, 7:23

|< < Post > >| << List/Tree >> January 2001 Index

On 29 Jan, Dirk Elzinga wrote:

>I have to agree with Matt: this doesn't seem very useful to
>someone who is genuinely interested in the proper identification
>of a text (unless that text happens to be in a language included
>in the database).
>
>Also the fact that the orthography can muck things up so easily
>is disappointing (though not surprising).
    Not taking the whole thing too seriously,
but wanting to follow up on the idea of orthography,
just for the fun of it I put some rap lyrics through their paces.
My "best" result was with the following:

[Snoop] Yeah, ha ha, Snoop Dogg
[W.C.]  Dub C.. heh, yeah
[Snoop] All up in here, bay-bay.. yeah
[W.C.]  Uh-huh
[Snoop] Straight G thang, yeah

That got me:  AngloSaxon 0.0266
followed by:   Choctaw       0.0166
                        Klingon         0.0163
                        (Klingon?!  ;-)  )
                         Sotho            0.0150

When I tried the complete lyrics of songs (including this one)
by a number of rappers, I kept getting
English at around the 31 percent level, always followed by
Scots at around 24 percent, AngloSaxon at around 12 percent
and Icelandic at around 1 percent.

Dan Sulani
--------------------------------------------------------------------
likehsna rtem zuv tikuhnuh auag inuvuz vaka'a.

A word is an awesome thing.

|< < Post > >| << List/Tree >> January 2001 Index