Re: Online Language Identifier
From: | Jörg Rhiemeier <joerg_rhiemeier@...> |
Date: | Tuesday, August 30, 2005, 19:06 |
Hallo!
"David J. Peterson" wrote:
> Radiohead will be pleased to know that Xerox is at it again! For
> those of you who don't check Langmaker.com every two hours,
> a resource was just posted about an online language identifier.
> It can be found here:
>
>
http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser-
> ISO-8859-1.en.html
>
> Basically it identifies the language that you put into the text
> field (a sentence of five words or more). It was reviewed on the
> blog Tenser Said the Tensor. The author put in Klingon, Quenya
> and Sindarin. Klingon apparently was fairly consistently identified
> as Maltese. I tried a couple of mine. The results:
>
> [results snup]
>
> Anyway, try it out! It's great fun! Plus, this might help out the
> "What language is this song/text in?" threads. I hear for real
> languages it's pretty accurate.
A nice toy to play with! I fed it with three samples of Old Albic
(the Babel Text and the relay texts from Relay #10 and #11) which
it all identified as Romanian (though some shorter snippets from
the Babel Text were identified as Spanish), and with the Germanech
text from Relay 10R, which it identified as Spanish. Spanish for
Germanech doesn't surprise me that much, but Romanian for Old Albic?
I also fed the Brithenig Pater Noster into it, which was identified
as - Welsh.
> (Oh, a side-note: It has a fixed number of languages [46] it's guessing
> from, and it lists them for you. This list includes Esperanto, but
> does not include any Austronesian language, I think... [What is
> Malay?]
Austronesian.
Greetings,
Jörg.