Re: Unknown Language Identifier!
From: | John Cowan <jcowan@...> |
Date: | Monday, January 29, 2001, 16:37 |
dirk elzinga wrote:
> I have to agree with Matt: this doesn't seem very useful to
> someone who is genuinely interested in the proper identification
> of a text (unless that text happens to be in a language included
> in the database).
But if your unknown text isn't in the database, knowing that it
has (at best) an 8% similarity to some other language really
doesn't say much. Except that it probably isn't any of them.
> Also the fact that the orthography can muck things up so easily
> is disappointing (though not surprising).
An orthography is a standard part of a written language. When
we want to identify text in English, we expect it to be written
using English orthography, not some random orthography.
--
There is / one art || John Cowan <jcowan@...>
no more / no less || http://www.reutershealth.com
to do / all things || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein