TECH: Language Detection

From:Arthaey Angosii <arthaey@...>
Date:Wednesday, July 5, 2006, 19:47
Since we have a fair number of programmers here, I figured it was a
good place to ask my question. :)

I'm working on a project to automatically detect what language some
text is in. Said text is really more like a phrase at a time, with a
high percentage of proper nouns.

Do any of you have any experience with programmatic language
detection? I'll probably be using character and n-gram frequencies,
perhaps supplemented by a custom dictionary (so the proper nouns that
reoccur can be used to increase accuracy in the future).

Any other techniques I should consider, or common pitfalls I should avoid?




