Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: About making a translator

From:H. S. Teoh <hsteoh@...>
Date:Tuesday, October 26, 2004, 23:35
On Wed, Oct 27, 2004 at 02:30:28AM +0400, Alexander Savenkov wrote:
> Hello, > > 2004-10-26T16:06:04+03:00 Ray Brown <ray.brown@...> wrote: > > > But, as Richard has written & I have discovered from experience, it > > is a highly non-trivial task. > > According to what I've read, this is an impossible task for now. > Machine translation will be possible with the invention of AI.
[...] Impossible to be 100% correct, yes. But may be possible to do an approximation. The essence of the problem is that natural language is inherently ambiguous, and requires (usually implicit) context to interpret correctly. Take for example the following quote, which I got from somebody on this list: Time flies like an arrow. Fruit flies like a banana. The second sentence is particularly pathological, in that it has two possible parses, both of which have sensible semantics: 1) Fruit-NP flies-V like-ADV (a banana)-NP 2) (Fruit flies)-NP like-V (a banana)-NP The problem with this kind of ambiguity is that it is an inherent ambiguity in English grammar. (And it's not just English alone; I believe most, if not all, natlangs are inherently ambiguous.) We don't have good algorithms for dealing with ambiguous grammars: it will require an exponential-complexity algorithm to determine all possible parses of these ambiguous sentences. Furthermore, once this has been done, you need some way to decide which of these parses is the correct one. But worse yet, context is required to properly interpret natlang sentences. We don't have good algorithms for dealing with context-sensitive grammars. In fact, I doubt if anyone even knows how to go about writing a context-sensitive grammar that expresses basic context-sensitivity, such as whether a referent has occurred earlier in the given text. Even if such a grammar were written, it would be extremely complex and difficult to understand. And we still don't have a feasible algorithm for parsing text according to context-sensitive grammars. Now, existing computer language compilers do deal with context-sensitivity, but only to a limited extent. The programming language's context-sensitive grammar is reduced to a non-ambiguous, context-free grammar, and the context-sensitivity is implemented as ad-hoc rules applied after the parse. The language would be designed and refined so that these rules are relatively straightforward to implement. When it comes to natural language, however, we don't have this option. And the context-sensitivity rules aren't as well-defined as in the computer language case, but are usually mere heuristics. Add on top of this the fact that most natlang texts leave out a lot of context that is required to properly interpret it. For example, some technical jargon uses common English words, but with different meanings from common usage. In technical journals, however, the explanation of such words is usually not included because it is well-known among the audience. From a computer's standpoint, this means that the context required to interpret such a text is not available, and so ambiguity cannot be resolved. Sometimes, the necessary context is *never* defined anywhere, simply because it is culturally understood. To write an algorithm to interpret such texts would require the coding of cultural conventions, which I doubt we even know how to represent digitally in a form usable by a parsing algorithm. And this doesn't even begin to account for regional differences, dialectal differences, and personal differences, which can sometimes play a big role in properly interpreting a given piece of text. And even if you can somehow surmount all these barriers, you still have to resolve the problem of how to map the highly-idiosyncratic parse you've just constructed to the grammar, context, and convention of the target language. Most of the time, the mapping is very non-trivial. What ends up happening most of the time is that you take the common denominator between the two and throw out everything else. Unfortunately, most of what gets thrown out is usually what carries the most important information. Having said all this, all hope is not lost; it is still possible to write approximate algorithms that can more-or-less parse natural language and produce approximate translations. Heuristics can be applied to make guesses that are right 90% of the time. Approximate being the keyword here, however, because currently existing translators fall woefully short of the quality needed for general use. As someone once said, "Heuristics are buggy by definition, because if they weren't buggy, they'd be algorithms." Perhaps AI might help in improving this, but with the current state of AI, I'm not holding my breath. T -- Creativity is not an excuse for sloppiness.

Reply

Ray Brown <ray.brown@...>