Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: CHAT: Machine translation (was Re: translation)

From:Mark J. Reed <markjreed@...>
Date:Tuesday, June 20, 2006, 14:30
On 6/20/06, Yahya Abdal-Aziz <yahya@...> wrote:
> "egregious" - I *love* that word! Do only North Americans use it?
Not as far as I know. I mean, as a North American, my opinion in this matter is necessarily suspect, but "egregious" seems like the sort of word you'd here *more* often on the other side of the pond, not less. You know, 'cause those hoity-toity Brits are so posh and fancy with their high-falutin' words. ;-)
> > "in he himself year..." for "en el mismo año"-- come on, again, groan.
That's really inexcusable. I guess it's just doing "longest match", and "él mísmo" is longer than either "el" or "mismo" by itself . . . but still, I wonder why it would reach for a version that requires a spelling change (introducing an accent mark) before considering the shorter matches??
> > One of the more amusing boo-boos is where he goes to foreign > > cities "to give lectures, _to char them_..." EH??? At first I thought it was a > > misprint for "to chair them", but of course it's "dar conferencias, > > _charlas_..." (lit. CHATS) i.e. TALKS, dammit. Worthy of one of our relays!!!!
That's its generic algorithm for dealing with unrecognized words. It leaves them untranslated, but attempts to infer the appropriate part of speech and inflection from typical morphology. So it sees "charlas", infers some verb "char" with direct object "las", and then renders it in English as "*to* char" because Spanish *char is clearly the infinitive form... Back when I was taking Spanish, I used to conjugate random words that happened to end in -ar - Spanish or otherwise(*), so I have some sympathy for the approach, but it's still a silly result. (*) One such posited verb is *tashayar, coined circa 1987-1988, which means "to act unconvincingly butch". tashayo, tashayas, tashaya, tashayamos, tashayais, tashayan. -- Mark J. Reed <markjreed@...>