TECH: Language Detection
From: | Arthaey Angosii <arthaey@...> |
Date: | Wednesday, July 5, 2006, 19:47 |
Since we have a fair number of programmers here, I figured it was a
good place to ask my question. :)
I'm working on a project to automatically detect what language some
text is in. Said text is really more like a phrase at a time, with a
high percentage of proper nouns.
Do any of you have any experience with programmatic language
detection? I'll probably be using character and n-gram frequencies,
perhaps supplemented by a custom dictionary (so the proper nouns that
reoccur can be used to increase accuracy in the future).
Any other techniques I should consider, or common pitfalls I should avoid?
Thanks!
--
AA
http://conlang.arthaey.com
Replies