TECH: Language Detection
|From:||Arthaey Angosii <arthaey@...>|
|Date:||Wednesday, July 5, 2006, 19:47|
Since we have a fair number of programmers here, I figured it was a
good place to ask my question. :)
I'm working on a project to automatically detect what language some
text is in. Said text is really more like a phrase at a time, with a
high percentage of proper nouns.
Do any of you have any experience with programmatic language
detection? I'll probably be using character and n-gram frequencies,
perhaps supplemented by a custom dictionary (so the proper nouns that
reoccur can be used to increase accuracy in the future).
Any other techniques I should consider, or common pitfalls I should avoid?