Re: About making a translator

From:Ray Brown <ray.brown@...>
Date:Wednesday, October 27, 2004, 19:15
On Wednesday, October 27, 2004, at 12:34 , H. S. Teoh wrote:

> On Wed, Oct 27, 2004 at 02:30:28AM +0400, Alexander Savenkov wrote: >> Hello, >> >> 2004-10-26T16:06:04+03:00 Ray Brown <ray.brown@...> wrote: >> >>> But, as Richard has written & I have discovered from experience, it >>> is a highly non-trivial task. >> >> According to what I've read, this is an impossible task for now. >> Machine translation will be possible with the invention of AI. > [...] > > Impossible to be 100% correct, yes. But may be possible to do an > approximation.
Yes, especially if the translation is of a text in some well defined knowledge domain such as nuclear physics, pop music, the Harry Potter stories, or whatever. But a general purpose translator is not possible at present.
> The essence of the problem is that natural language is inherently > ambiguous, and requires (usually implicit) context to interpret > correctly.
In fact it requires 'real world' knowledge - in other words a truly immense knowledge base.
> Take for example the following quote, which I got from > somebody on this list: > > Time flies like an arrow. > Fruit flies like a banana.
It's a couplet I often quote. I forget the details, but "Time flies like an arrow" was deliberately devised to test an early natural language parser at, I believe, MIT (almost certainly written in LISP). From what I remember, some rather liked the machines suggestion of a species of fly know as "time flies" all going crazy over an arrow hence the second sentence :)
> The second sentence is particularly pathological, in that it has two > possible parses, both of which have sensible semantics: > 1) Fruit-NP flies-V like-ADV (a banana)-NP > 2) (Fruit flies)-NP like-V (a banana)-NP
The first sentences is actually even more ambiguous if one relies just on a syntactic analysis, depending whether we take "time", "flies" or "like" as the main verb, thus: Taking "time" as the imperative of the verb "to time" and 'flies" as a plural noun, being the direct object of the verb "time", we have: 1. Time flies in a similar way to that in which you would time an arrow. 2. Time flies in a similar way to that in which an arrow would time flies. Ok K - the second meaning is particularly stupid and no human interpret it that way. But it requires _semantic_ knowledge to realize that. On syntax alone the meaning is possible. Both those parsings assume "like" is a conjunction meaning "in a similar way that" (i.e. it means "as"). "like" could be an adjective, defining those darn "flies", thus: 3. Time only those flies which have a similar shape to an arrow. If we take "time" as a noun, being the subject of the verb "flies" (3rd pers. singular of the present tense of "to fly") we get: 4. Time flies in an analogous way to the way an arrow flies. So far we have been taking "like" either as a conjunction meaning "in a similar way that" or "in an analogous way to" or as an adjective meaning "similar (to)" ; but we could take "like" as a verb, being the third person plural with subject "flies". "Time" is an epithet noun, that is we have our species of fly known as "time flies:, thus: 5. Time-flies are just crazy about eating an arrow.
> The problem with this kind of ambiguity is that it is an inherent > ambiguity in English grammar. (And it's not just English alone; I > believe most, if not all, natlangs are inherently ambiguous.)
This type of ambiguity is typical of languages with little morphology and great reliance on syntax, like English or modern Chinese. But ambiguity is inherent in all natural languages, if only because on the semantic level words often have wide ranges of meaning. [snip - but all very true]
> applied to make guesses that are right 90% of the time. Approximate > being the keyword here, however, because currently existing > translators fall woefully short of the quality needed for general use.
> As someone once said, "Heuristics are buggy by definition, because if > they weren't buggy, they'd be algorithms."
Exactly so.
> Perhaps AI might help in > improving this, but with the current state of AI, I'm not holding my > breath.
Nor I - my lungs ain't big enough :)