Re: CHAT: Machine translation (was Re: translation)
From: | Mark J. Reed <markjreed@...> |
Date: | Tuesday, June 20, 2006, 14:30 |
On 6/20/06, Yahya Abdal-Aziz <yahya@...> wrote:
> "egregious" - I *love* that word! Do only North Americans use it?
Not as far as I know. I mean, as a North American, my opinion in this
matter is necessarily suspect, but "egregious" seems like the sort of
word you'd here *more* often on the other side of the pond, not less.
You know, 'cause those hoity-toity Brits are so posh and fancy with
their high-falutin' words. ;-)
> > "in he himself year..." for "en el mismo año"-- come on, again, groan.
That's really inexcusable. I guess it's just doing "longest match",
and "él mísmo" is longer than either "el" or "mismo" by itself . . .
but still, I wonder why it would reach for a version that requires a
spelling change (introducing an accent mark) before considering the
shorter matches??
> > One of the more amusing boo-boos is where he goes to foreign
> > cities "to give lectures, _to char them_..." EH??? At first I thought it was a
> > misprint for "to chair them", but of course it's "dar conferencias,
> > _charlas_..." (lit. CHATS) i.e. TALKS, dammit. Worthy of one of our relays!!!!
That's its generic algorithm for dealing with unrecognized words. It
leaves them untranslated, but attempts to infer the appropriate part
of speech and inflection from typical morphology. So it sees
"charlas", infers some verb "char" with direct object "las", and then
renders it in English as "*to* char" because Spanish *char is clearly
the infinitive form...
Back when I was taking Spanish, I used to conjugate random words that
happened to end in -ar - Spanish or otherwise(*), so I have some
sympathy for the approach, but it's still a silly result.
(*) One such posited verb is *tashayar, coined circa 1987-1988, which
means "to act unconvincingly butch". tashayo, tashayas, tashaya,
tashayamos, tashayais, tashayan.
--
Mark J. Reed <markjreed@...>