Conlang: Re: CHAT: Machine translation (was Re: translation) (Yahya Abdal-Aziz, Jun 20 '06, 14:06)

From:	Yahya Abdal-Aziz <yahya@...>
Date:	Tuesday, June 20, 2006, 14:06

Hi all,

> Yahya Abdal-Aziz wrote: > > > Sometimes a translation really cracks me up. > > Rather than looking up a couple of words on > > the original page in Spanish, I took the easy > > way out - googled for the topic, then clicked > > "Translate this page". Here's what I got: > > http://tinyurl.com/m3pfw > > > > Test: See how much of this Spanglish you can > > understand WITHOUT clicking on "View > > original page". > >

> Date: Mon, 19 Jun 2006 > From: Roger Mills > Subject: Re: translation

> That's funny, haha, but also cringe-making. I could > almost completely reconstruct the Spanish text as > I read through; unnecessary even to peek at the > original. ...

Ditto, based on a fairly rudimentary knowledge of Spanish in my case.

> ... Really unforgivably silly and stupid errors, > one would think, but Machine Translation....well > what did we expect anyway?

I expected much better than this! There are several free translators available on the web which do a *much* better job than Google did. I'd like to point out that the Google translation is virtually a case of word-by-word substitution across lexicons. One of the better ones is imTranslator, at: http://translation.paralink.com/default.asp I've used this quite often to help me "fill in the gaps" when reading webpages in other languages, or when writing to my efriends in Spain and Latin America. Although it in no way approaches human translation, it makes far fewer of the ludicrous mistakes I noted with Google. And it's far quicker than looking up my so-called "Pocket" Oxford Spanish dictionary.

> Still and all, it wasn't total gibberish.

But took a bit of thought to untangle ...

> ------------------------------ > > Date: Mon, 19 Jun 2006 > From: "Mark J. Reed" > > On 6/19/06, Roger Mills wrote: > > Machine Translation....well what did we expect anyway? > > Not much. MT has gotten cheaper and more widely available, but the > quality hasn't really improved much in the last 30 years. > > Sometime in the 80s or thereabouts the AI community mostly stopped > working on what we might think of as "real" or "pure" AI - by which I > mean, the attempt to produce software that can think (or, as some > would say, give the impression of thinking, but I don't want to get > into the fallacy of the philosophical zombie on here) like a human. > After decades of funding, that sort of research had consistently > yielded not much in the way of profitable applications, and > consequently said funding dried up. As a result the AI community > shifted its focus to narrowly-scoped, rules-based "expert systems" > instead. > > Such systems led to commercial success. The ubiquitous "driving > directions" software provided by mapping sites and GPS devices is a > good example. In more specialized domains expert systems are > everywhere; there is software that can do a preliminary medical > diagnosis, or help police and military modify an attack plan on the > fly based on enemy activity. Google's indexing engine is essentially > an expert system for classifying documents - although it's also > another example where software that could actually *understand* what > it was reading would do a better job. > > Unfortunately, translation doesn't lend itself well to the > expert-system approach. My personal belief is that reliable MT is > tantamount to real AI - that the two problems are, in fact, actually > the same problem. There is a widespread belief (held by many in the > AI community itself, not just by laymen) that MT is somehow not only > more tractable than real AI, but orders of magnitude simpler. But > IMHO, that's just another example of humans underestimating the > difficulty of something that we happen to be optimized for.

I guess I think the truth lies between these two extremes. Some of our intelligent behaviour involves mental processing of a kind that we can't yet fully explicate, let alone describe in common speech. I think there will always be dimensions of intelligent behaviour that are inchoate, incapable of articulation.

> ------------------------------ > > Date: Mon, 19 Jun 2006 > From: Gary Shannon > > --- "Mark J. Reed" wrote: > > <snip> > > Unfortunately, translation doesn't lend itself well > > to the > > expert-system approach. My personal belief is that > > reliable MT is > > tantamount to real AI - that the two problems are, > > in fact, actually > > the same problem. > > My own opinion is a bit different. I have played > aroujnd with some machine translation programming and > some chat-bot programming and after researching all > the various parsing engines out there I think the real > problem is the parsers. They parse for structure > instead of parsing for meaning. Once a parser is built > that will yeild the IDENTICAL parse tree for the two > sentences below, then the machine translation problem > will be mostly solved: > > 1. "Old Mother Hubbard went to the cupboard." > 2. "It was to the wall-mounted cabinet that the eldery

[sic; "elderly"?]

> woman named Mother Hubbard did go."

I see the first sentence as one possible distinct verbal realisation (or "utterance") (in the English Language) of relationships between concepts. Other utterances may capture much the same content, but rarely if ever exactly the same. The content here is semantic, and refers both to entities, such as persons and inanimate objects, and to relationships between those entities. (Those entities may include relationships.) Relationships may be expressed by words, or by arrangements of words (which includes all of syntax.) Between the meanings (semantic content) of the two sentences, there is a great deal of overlap, but I don't take them to be distinct utterances expressing the same meaning. Do you? To say, instead, that their meanings are similar is to pose the question: in what does that similarity consist? Are we perhaps talking of mathematical mappings between sets, or is there a more fruitful way of describing such similarities? Is it useful to think of the possible denotations of a word as a "cloud" of other words around it, those nearest in meaning being nearest in physical space? (At least one "visual thesaurus" available on the Internet uses this model.) If so, how many dimension should such a model have - perhaps one for each realm of application? (That might lead to a unwieldy model with a dozen or more dimensions - look up, for example, the definition of "work" on Wiktionary.) Realistically, I think we need usable measures of semantic distance or overlap, and rather than have a translator (human or machine) aim at producing identity between the structures of two utterances, have the translator aim at producing a minimal semantic distance between the pair. None of the above really handles the "connotation" aspect of meaning very well, I think.

> My own (incomplete) chat-bot parser project aims to do > just that. ( http://fiziwig.com -- Under the heading > "Computerized Linguistics and Machine Translation" and > "Artificial Intelligence")

Gary, I wish you well in your project! Please keep me posted, on- or off-list, with your progress.

> ------------------------------ > Date: Mon, 19 Jun 2006 > From: taliesin the storyteller > > * Gary Shannon said > > --- "Mark J. Reed" wrote: > > > > <snip> > > > Unfortunately, translation doesn't lend itself well to the > > > expert-system approach. My personal belief is that reliable MT is > > > tantamount to real AI - that the two problems are, in fact, actually > > > the same problem. > > This is my belief also - and I now have a master in MT to back me up :)

S'pose we oughta listen to ya, then ! ;-) My intuition is that, fascinating and complex though language is, a complete attainment of Artificial Language Intelligence would still fall a long way short of complete Artificial Intelligence.

> [..] the real problem is the parsers. They parse for structure instead > > of parsing for meaning. Once a parser is built that will yeild the > > IDENTICAL parse tree for the two sentences below, then the machine > > translation problem will be mostly solved: > > > > 1. "Old Mother Hubbard went to the cupboard." > > 2. "It was to the wall-mounted cabinet that the eldery woman named > > Mother Hubbard did go." > > But these two sentences doesn't mean quite the same thing, and would be > rendered differently in target languages as well.

Yes, of course they must, since we don't even have the same set of concepts to draw on in two cultures. Remember our discussion of colour terms some time back? You simply can't say: "Yellow as the sun" in a language that doesn't have a word for yellow, or "Tasty as walrus blubber" in any Australian language.

> ------------------------------ > Date: Mon, 19 Jun 2006 > From: Gary Shannon > > --- taliesin the storyteller wrote: > > > * Gary Shannon said > > <snip> > > [..] the real problem is the parsers. They parse for > > structure instead > > > of parsing for meaning. Once a parser is built > > that will yeild the > > > IDENTICAL parse tree for the two sentences below, > > then the machine > > > translation problem will be mostly solved: > > > > > > 1. "Old Mother Hubbard went to the cupboard." > > > 2. "It was to the wall-mounted cabinet that the > > eldery woman named > > > Mother Hubbard did go." > > > > But these two sentences doesn't mean quite the same > > thing, and would be > > rendered differently in target languages as well. > > > > > > t. > > There is a world of difference between "literary > translation and "utilitarian translation." I believe > that literary translation would require full-blown AI, > while compentant utilitarian translation could be > accomplished with a good deal less.

I believe "competent utilitarian translation" to be a worthy goal, and possibly very much more *utilitarian* than literary translation. ;-) Though, arguably, much of literature is "lost in translation", there does exist a market for translating the greatest (most acclaimed) foreign authors into many languages. That market is, regrettably, so small that in a city of over two million people, I have to place special orders for most of the works by Gabriel Garcia Marquez, Rimbaud, Baudelaire, Flaubert, Hesse, etc - whether in the original or in translation. OT: And to get a work using the Tifinagh Berber alphabet, I importuned a friend who was recently travelling in Morocco. On his last day of a several- week stay, he finally found me ONE copy of a school dictionary in a bookshop in a small town on the far side of the Atlas Mountains. The shop owner had acquired two copies, but kept one for his own use ... It is "Amawal a(gh)ubiz", by Brahim Barouch (I think), subtitled "Lexique Scolaire - Student Dictionary" (and the equivalent in Arabic and Tamazight (or Amazigh)) - Français-Tamazight - English-Tamazight - {Arabiyyat-Tamazight}. My friend John just got back, via Spain and Vietnam (!), and he handed it to me when we met last night.

> I think that the first goal of utilitarian translation > should be good translation of the basic meaning. Only > after that is mastered would nuances of meaning be > tackled. For all practical purposes (e.g. translating > bicycle assembly instructions from Japanese to > English) such nuance is unnecessary and the two > sentences can be thought of as describing essentially > identical events.

However, there is a nuance that the translators of manuals for bicycle assembly need to consider very carefully: how to (politely) instruct in the target language. Another nuance is the familiarity one should assume the assembler has with the parts of the bicycle and the tools it is necessary to use. Cultural issues all, and not easily subsumed under "basic meaning". Despite this quibble, I agree that a translator should be able to express objective and concrete things, as well as some simple actions that most of us have in our repertoire, as stepping stones to performing more complex or subtle translations. I also agree that much of the perceived value in translation lies in this more concrete, utilitarian realm. Still, much of the fun lies outside it!

> I think the initial focus needs to be on conveying > what event is being described by the sentence, and not > on capturing literary style or nuance. If I am reading > a scientific paper for its content I don't care if the > author was a literary giant, or if he's a middling > hack barely capable of putting together a > grammatically correct sentence, as long as the > essential meaning is correct.

Yes, well ... I read a paper the other day, which I hesitated to recommend to others, simply because the author has an inadequate grasp of English grammar (or doesn't proof-read at all). In the end, I thought the material so pertinent to our discussion that I passed it on anyway, as it was the only reference I'd found online to the probable route(s) by which West Asian lutes had found their way to Malaysia. Many of his statements were, simply, nonsense, and the fault was not in his research, his facts, or his reasoning, but purely in his grammar. That this person has a university post was, I thought, quite incredible.

> ------------------------------ > > Date: Mon, 19 Jun 2006 > From: Roger Mills > > I won't give a word-by-word critique on the English/Spanish article Yahya > posted, but will point out some of the more egregious errors:

"egregious" - I *love* that word! Do only North Americans use it?

> Right away in the title heading: "it finishes receiving..." for "acaba de > recibir" = 'he has just received...' > > As with many of the 3d pers. verbs like the above with subject "it", the > frequent mistrans. of possessive su(s) as 'its', when it almost > always, very > clearly, should be "his". > > "The School of Beautiful Arts"-- oh, come on. > > I can see how the "al + infinitive" construction could cause > confusion, but > it's something one learns in Year 1 Spanish, and is very common. > The entire > sentence beginning "Al estallar la guerra..."= "When exploding > the war..." > is such a hash that I doubt someone unfamiliar with Span. would make any > sense out of it. "pain that was exchanged" for "sentence that was > commuted..." etc. etc. > "in he himself year..." for "en el mismo ao"-- come on, again, groan. > > One of the more amusing boo-boos is where he goes to foreign > cities "to give > lectures, _to char them_..." EH??? At first I thought it was a > misprint for > "to chair them", but of course it's "dar conferencias, > _charlas_..." (lit. > CHATS) i.e. TALKS, dammit. Worthy of one of our relays!!!!

Yep, I agree it was all pretty thoroughly bad. The errors were mostly caused by the inability of the translator to proceed beyond the word- by-word substitution model. But my point was not so much that it was a bad translation, but that structures that, say, an English speaker would never think of *can* be used to express certain meanings. On reflection, I hope that English speaker might wonder why he does *not* express those meanings in that fashion. This in turn may lead to his being more open to considering, even inventing, alternative structures for his own conlangs.

> Oh well, basta... At least it got me interested in him-- sorry to say I'm > not up on 20th Cent. Spanish dramatists.

Nor am I - a gap I intend to fill, as I'm sure they must have had something noteworthy to say about the extraordinary events of the century, both in Spain and in the whole Hispanophone world. Regards, Yahya -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.9.1/369 - Release Date: 19/6/06

Re: CHAT: Machine translation (was Re: translation)

Reply