Re: TECH: Language Detection

From:Gary Shannon <fiziwig@...>
Date:Thursday, July 6, 2006, 7:19
OOps. Forgot to mention "stop words". Those are the
VERY most comon words in any language. Check

And Google for stop words lists in various languages.
These are the very words you want to do language
detection on.

This site:

has stopwords for the following languages:

Catalan , Czech , Danish , Dutch , French , German ,
Hungarian , Italian , Norwegian , Polish , Portugese ,
Spanish , Turkish

Some English stopwords: of on or that the this to was
what when where

Some Danish stopwords: begge da de den denne der deres
det dette dig din dog du ej eller en end ene


--- Arthaey Angosii <arthaey@...> wrote:

> Since we have a fair number of programmers here, I > figured it was a > good place to ask my question. :)