Re: CHAT: Machine translation (was Re: translation)
From: | Yahya Abdal-Aziz <yahya@...> |
Date: | Tuesday, June 20, 2006, 14:06 |
Hi all,
> Yahya Abdal-Aziz wrote:
>
> > Sometimes a translation really cracks me up.
> > Rather than looking up a couple of words on
> > the original page in Spanish, I took the easy
> > way out - googled for the topic, then clicked
> > "Translate this page". Here's what I got:
> >
http://tinyurl.com/m3pfw
> >
> > Test: See how much of this Spanglish you can
> > understand WITHOUT clicking on "View
> > original page".
> >
> Date: Mon, 19 Jun 2006
> From: Roger Mills
> Subject: Re: translation
> That's funny, haha, but also cringe-making. I could
> almost completely reconstruct the Spanish text as
> I read through; unnecessary even to peek at the
> original. ...
Ditto, based on a fairly rudimentary knowledge
of Spanish in my case.
> ... Really unforgivably silly and stupid errors,
> one would think, but Machine Translation....well
> what did we expect anyway?
I expected much better than this! There are
several free translators available on the web which
do a *much* better job than Google did. I'd like to
point out that the Google translation is virtually a
case of word-by-word substitution across lexicons.
One of the better ones is imTranslator, at:
http://translation.paralink.com/default.asp
I've used this quite often to help me "fill in the gaps"
when reading webpages in other languages, or when
writing to my efriends in Spain and Latin America.
Although it in no way approaches human translation,
it makes far fewer of the ludicrous mistakes I noted
with Google. And it's far quicker than looking up my
so-called "Pocket" Oxford Spanish dictionary.
> Still and all, it wasn't total gibberish.
But took a bit of thought to untangle ...
> ------------------------------
>
> Date: Mon, 19 Jun 2006
> From: "Mark J. Reed"
>
> On 6/19/06, Roger Mills wrote:
> > Machine Translation....well what did we expect anyway?
>
> Not much. MT has gotten cheaper and more widely available, but the
> quality hasn't really improved much in the last 30 years.
>
> Sometime in the 80s or thereabouts the AI community mostly stopped
> working on what we might think of as "real" or "pure" AI - by which I
> mean, the attempt to produce software that can think (or, as some
> would say, give the impression of thinking, but I don't want to get
> into the fallacy of the philosophical zombie on here) like a human.
> After decades of funding, that sort of research had consistently
> yielded not much in the way of profitable applications, and
> consequently said funding dried up. As a result the AI community
> shifted its focus to narrowly-scoped, rules-based "expert systems"
> instead.
>
> Such systems led to commercial success. The ubiquitous "driving
> directions" software provided by mapping sites and GPS devices is a
> good example. In more specialized domains expert systems are
> everywhere; there is software that can do a preliminary medical
> diagnosis, or help police and military modify an attack plan on the
> fly based on enemy activity. Google's indexing engine is essentially
> an expert system for classifying documents - although it's also
> another example where software that could actually *understand* what
> it was reading would do a better job.
>
> Unfortunately, translation doesn't lend itself well to the
> expert-system approach. My personal belief is that reliable MT is
> tantamount to real AI - that the two problems are, in fact, actually
> the same problem. There is a widespread belief (held by many in the
> AI community itself, not just by laymen) that MT is somehow not only
> more tractable than real AI, but orders of magnitude simpler. But
> IMHO, that's just another example of humans underestimating the
> difficulty of something that we happen to be optimized for.
I guess I think the truth lies between these
two extremes. Some of our intelligent behaviour
involves mental processing of a kind that we can't
yet fully explicate, let alone describe in common
speech. I think there will always be dimensions of
intelligent behaviour that are inchoate, incapable
of articulation.
> ------------------------------
>
> Date: Mon, 19 Jun 2006
> From: Gary Shannon
>
> --- "Mark J. Reed" wrote:
>
> <snip>
> > Unfortunately, translation doesn't lend itself well
> > to the
> > expert-system approach. My personal belief is that
> > reliable MT is
> > tantamount to real AI - that the two problems are,
> > in fact, actually
> > the same problem.
>
> My own opinion is a bit different. I have played
> aroujnd with some machine translation programming and
> some chat-bot programming and after researching all
> the various parsing engines out there I think the real
> problem is the parsers. They parse for structure
> instead of parsing for meaning. Once a parser is built
> that will yeild the IDENTICAL parse tree for the two
> sentences below, then the machine translation problem
> will be mostly solved:
>
> 1. "Old Mother Hubbard went to the cupboard."
> 2. "It was to the wall-mounted cabinet that the eldery
[sic; "elderly"?]
> woman named Mother Hubbard did go."
I see the first sentence as one possible distinct
verbal realisation (or "utterance") (in the English
Language) of relationships between concepts.
Other utterances may capture much the same
content, but rarely if ever exactly the same.
The content here is semantic, and refers both
to entities, such as persons and inanimate objects,
and to relationships between those entities. (Those
entities may include relationships.) Relationships
may be expressed by words, or by arrangements
of words (which includes all of syntax.)
Between the meanings (semantic content) of the
two sentences, there is a great deal of overlap,
but I don't take them to be distinct utterances
expressing the same meaning. Do you?
To say, instead, that their meanings are similar is
to pose the question: in what does that similarity
consist? Are we perhaps talking of mathematical
mappings between sets, or is there a more fruitful
way of describing such similarities? Is it useful to
think of the possible denotations of a word as a
"cloud" of other words around it, those nearest in
meaning being nearest in physical space? (At least
one "visual thesaurus" available on the Internet
uses this model.) If so, how many dimension should
such a model have - perhaps one for each realm of
application? (That might lead to a unwieldy model
with a dozen or more dimensions - look up, for
example, the definition of "work" on Wiktionary.)
Realistically, I think we need usable measures of
semantic distance or overlap, and rather than have
a translator (human or machine) aim at producing
identity between the structures of two utterances,
have the translator aim at producing a minimal
semantic distance between the pair.
None of the above really handles the "connotation"
aspect of meaning very well, I think.
> My own (incomplete) chat-bot parser project aims to do
> just that. (
http://fiziwig.com -- Under the heading
> "Computerized Linguistics and Machine Translation" and
> "Artificial Intelligence")
Gary, I wish you well in your project! Please keep
me posted, on- or off-list, with your progress.
> ------------------------------
> Date: Mon, 19 Jun 2006
> From: taliesin the storyteller
>
> * Gary Shannon said
> > --- "Mark J. Reed" wrote:
> >
> > <snip>
> > > Unfortunately, translation doesn't lend itself well to the
> > > expert-system approach. My personal belief is that reliable MT is
> > > tantamount to real AI - that the two problems are, in fact, actually
> > > the same problem.
>
> This is my belief also - and I now have a master in MT to back me up :)
S'pose we oughta listen to ya, then ! ;-)
My intuition is that, fascinating and complex
though language is, a complete attainment of
Artificial Language Intelligence would still
fall a long way short of complete Artificial
Intelligence.
> [..] the real problem is the parsers. They parse for structure instead
> > of parsing for meaning. Once a parser is built that will yeild the
> > IDENTICAL parse tree for the two sentences below, then the machine
> > translation problem will be mostly solved:
> >
> > 1. "Old Mother Hubbard went to the cupboard."
> > 2. "It was to the wall-mounted cabinet that the eldery woman named
> > Mother Hubbard did go."
>
> But these two sentences doesn't mean quite the same thing, and would be
> rendered differently in target languages as well.
Yes, of course they must, since we don't even
have the same set of concepts to draw on in two
cultures. Remember our discussion of colour
terms some time back? You simply can't say:
"Yellow as the sun" in a language that doesn't
have a word for yellow, or "Tasty as walrus blubber"
in any Australian language.
> ------------------------------
> Date: Mon, 19 Jun 2006
> From: Gary Shannon
>
> --- taliesin the storyteller wrote:
>
> > * Gary Shannon said
>
> <snip>
> > [..] the real problem is the parsers. They parse for
> > structure instead
> > > of parsing for meaning. Once a parser is built
> > that will yeild the
> > > IDENTICAL parse tree for the two sentences below,
> > then the machine
> > > translation problem will be mostly solved:
> > >
> > > 1. "Old Mother Hubbard went to the cupboard."
> > > 2. "It was to the wall-mounted cabinet that the
> > eldery woman named
> > > Mother Hubbard did go."
> >
> > But these two sentences doesn't mean quite the same
> > thing, and would be
> > rendered differently in target languages as well.
> >
> >
> > t.
>
> There is a world of difference between "literary
> translation and "utilitarian translation." I believe
> that literary translation would require full-blown AI,
> while compentant utilitarian translation could be
> accomplished with a good deal less.
I believe "competent utilitarian translation"
to be a worthy goal, and possibly very much
more *utilitarian* than literary translation. ;-)
Though, arguably, much of literature is "lost in
translation", there does exist a market for
translating the greatest (most acclaimed)
foreign authors into many languages. That
market is, regrettably, so small that in a city of
over two million people, I have to place special
orders for most of the works by Gabriel Garcia
Marquez, Rimbaud, Baudelaire, Flaubert, Hesse,
etc - whether in the original or in translation.
OT: And to get a work using the Tifinagh Berber
alphabet, I importuned a friend who was recently
travelling in Morocco. On his last day of a several-
week stay, he finally found me ONE copy of a
school dictionary in a bookshop in a small town on
the far side of the Atlas Mountains. The shop
owner had acquired two copies, but kept one for
his own use ... It is "Amawal a(gh)ubiz", by Brahim
Barouch (I think), subtitled "Lexique Scolaire -
Student Dictionary" (and the equivalent in Arabic
and Tamazight (or Amazigh)) - Français-Tamazight
- English-Tamazight - {Arabiyyat-Tamazight}. My
friend John just got back, via Spain and Vietnam
(!), and he handed it to me when we met last night.
> I think that the first goal of utilitarian translation
> should be good translation of the basic meaning. Only
> after that is mastered would nuances of meaning be
> tackled. For all practical purposes (e.g. translating
> bicycle assembly instructions from Japanese to
> English) such nuance is unnecessary and the two
> sentences can be thought of as describing essentially
> identical events.
However, there is a nuance that the translators
of manuals for bicycle assembly need to consider
very carefully: how to (politely) instruct in the
target language. Another nuance is the familiarity
one should assume the assembler has with the parts
of the bicycle and the tools it is necessary to use.
Cultural issues all, and not easily subsumed under
"basic meaning".
Despite this quibble, I agree that a translator
should be able to express objective and concrete
things, as well as some simple actions that most of
us have in our repertoire, as stepping stones to
performing more complex or subtle translations.
I also agree that much of the perceived value in
translation lies in this more concrete, utilitarian
realm. Still, much of the fun lies outside it!
> I think the initial focus needs to be on conveying
> what event is being described by the sentence, and not
> on capturing literary style or nuance. If I am reading
> a scientific paper for its content I don't care if the
> author was a literary giant, or if he's a middling
> hack barely capable of putting together a
> grammatically correct sentence, as long as the
> essential meaning is correct.
Yes, well ... I read a paper the other day, which
I hesitated to recommend to others, simply
because the author has an inadequate grasp of
English grammar (or doesn't proof-read at all).
In the end, I thought the material so pertinent
to our discussion that I passed it on anyway, as
it was the only reference I'd found online to
the probable route(s) by which West Asian lutes
had found their way to Malaysia. Many of his
statements were, simply, nonsense, and the
fault was not in his research, his facts, or his
reasoning, but purely in his grammar. That this
person has a university post was, I thought,
quite incredible.
> ------------------------------
>
> Date: Mon, 19 Jun 2006
> From: Roger Mills
>
> I won't give a word-by-word critique on the English/Spanish article Yahya
> posted, but will point out some of the more egregious errors:
"egregious" - I *love* that word! Do only
North Americans use it?
> Right away in the title heading: "it finishes receiving..." for "acaba de
> recibir" = 'he has just received...'
>
> As with many of the 3d pers. verbs like the above with subject "it", the
> frequent mistrans. of possessive su(s) as 'its', when it almost
> always, very
> clearly, should be "his".
>
> "The School of Beautiful Arts"-- oh, come on.
>
> I can see how the "al + infinitive" construction could cause
> confusion, but
> it's something one learns in Year 1 Spanish, and is very common.
> The entire
> sentence beginning "Al estallar la guerra..."= "When exploding
> the war..."
> is such a hash that I doubt someone unfamiliar with Span. would make any
> sense out of it. "pain that was exchanged" for "sentence that was
> commuted..." etc. etc.
> "in he himself year..." for "en el mismo ao"-- come on, again, groan.
>
> One of the more amusing boo-boos is where he goes to foreign
> cities "to give
> lectures, _to char them_..." EH??? At first I thought it was a
> misprint for
> "to chair them", but of course it's "dar conferencias,
> _charlas_..." (lit.
> CHATS) i.e. TALKS, dammit. Worthy of one of our relays!!!!
Yep, I agree it was all pretty thoroughly bad.
The errors were mostly caused by the inability
of the translator to proceed beyond the word-
by-word substitution model.
But my point was not so much that it was a bad
translation, but that structures that, say, an
English speaker would never think of *can* be
used to express certain meanings. On reflection,
I hope that English speaker might wonder why he
does *not* express those meanings in that fashion.
This in turn may lead to his being more open to
considering, even inventing, alternative structures
for his own conlangs.
> Oh well, basta... At least it got me interested in him-- sorry to say I'm
> not up on 20th Cent. Spanish dramatists.
Nor am I - a gap I intend to fill, as I'm sure they
must have had something noteworthy to say about
the extraordinary events of the century, both in
Spain and in the whole Hispanophone world.
Regards,
Yahya
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.9.1/369 - Release Date: 19/6/06
Reply