Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: THEORY: Parsing for meaning.

From:Yahya Abdal-Aziz <yahya@...>
Date:Monday, June 26, 2006, 13:07
Hi all!

On Sun, 25 Jun 2006, Gary Shannon wrote:
> > Looking back over an old conlang project called > SOALOA, ( http://fiziwig.com/soaloa/soaloa.html ) it > occured to me that the biggest obstacle to proper > machine translation is extracting the real meaning of > a sentence to be translated. As far as I know machine > translation programs don't try to deal with "meaning", > only with structure and dictionary replacements. If > the meaning of a target sentence could be properly > extracted and encoded then writing a decent sentence > generator for any given language, based on basic > standardized sentence patterns, would be relatively > easy. > > But how to encode the information conveyed by a > sentence? Taking a hint from SOALOA I tried to reduce > any sentence, regardless of complexity, to a sequence > of simple SVO sentences, each optionally beginning > with a "linking word", which taken together encode the > complete literal meaning (if not the literary nuances) > of a sentence. Combining that idea with another of my > old projects to build an automated parser ( > http://www.fiziwig.com/parser/parse1.html ) I thought > it might be possible to iteratively deconstruct a > sentence into a paraphrase in the form of a sequence > of [L]SVO sentences by simple pattern matching. > > At each step a portion of the sentence is matched, > replaced by the [L]SVO output sentence, and then > removed from the original sentence leaving a simpler > sentence to be further decomposed by the next > iteration. > > Thus: We are watching the antics of this funny little > monkey. > > Is paraphrased: > > We watch this: (SVO) > That monkey performs antics. (LSVO) > Same monkey is funny. (LSVO) > Same monkey is little. (LSVO) > > These four sentences capture and encode in a standard > format the complete meaning of the sentence. > > The pattern-matching steps would be (roughly): > > We are watching the antics of this funny --little > monkey--. (Pattern adj+noun) > => Same monkey is little. > We are watching the antics of this --funny monkey--. > (Pattern adj+noun) > => Same monkey is funny. > We are watching the --antics of this monkey--. > (Idiomatic pattern) > => That monkey performs antics. > --We are watching-- > => We watch this: > > Another example: > > Mercury bound his winged sandals to his feet, and took > his wand in his hand. > > Mercury caused this: (SVO) > That sandals are_bound_to feet. (LSVO) > Same feet belong_to Mercury. (LSVO) > Same sandals belong_to Mercury. (LSVO) > Same sandals have wings. (LSVO) > Then Mercury caused this: (LSVO) > That wand be_in hand. (LSVO) > Same hand belongs_to Mercury. (LSVO) > Same wand belongs_to Mercury. (LSVO) > > The pattern-matching steps are (roughly): > > Mercury bound his --winged sandals-- to his feet, and > took his wand in his hand. > Same sandals have wings. > Mercury bound --his sandals-- to his feet, and took > his wand in his hand. > Same sandals belong_to Mercury. > --Mercury bound sandals-- to his feet, and took his > wand in his hand. > Mercury caused this: > sandals bound to --his feet--, and took his wand in > his hand. > Feet belong_to Mercury. > --sandals bound to feet--, and took his wand in his > hand. > That sandals be_bound_to feet. > and took --his wand-- in --his hand--. > Wand belongs_to Mercury. > Hand belongs_to Mercury. > --and took wand in hand--. > Then Mercury caused this: > --wand in hand--. > That wand be_in hand. > > Thoughts?
Gary, the technique seems promising. Some questions for you: 1. How do you know how to decompose, eg,
> --little monkey--. (Pattern adj+noun) > => Same monkey is little
- is it by rules, categories or sheer statistical "occurrence in context" ie of one word in a pattern of other words? 2. Does your pattern matching system *need* the categories of "noun" and "adjective"; or is naming these categories just a shorthand way by which humans can express and recognise a pattern or range of patterns? 3. What makes you think the decompositions will be unique? 4. Should they be? 5. Do you have a closed set of primitive relations in mind, such as "belong_to", "be_in", etc? (Probably not, if you need such things as "be_bound_to".) 5a. If so, what are they? 5b. If not, how do you propose to create the relations necessary to parse an utterance?
> ------------------------------
Eugene Oh replied to Gary:
> This reply might not be exactly what you were looking for, but I was > wondering whether you've read the June 16th issue of the Economist, in > which is an article about machine translation. What the article said > was basically that scientists realised that the more efficient method > of machine translation was not by keying in numerous grammar rules and > vocabulary replacements, but through statistical analysis: let the > computer parse a text unguided and it will figure out which word is of > what function, and arrive at a set of rules and glosses itself, > replete with exceptions. I don't know how it works, but it sure sounds > fascinating.
Eugene, Thanks for the information. Have a look at these links: New Tech speaks many languages at once: http://tinyurl.com/nchhc How to build a Babel fish: http://tinyurl.com/oz9bj They don't fully answer your question, but I guess give enough to get started on your own super-Dolmetsch project ...
> ------------------------------ > > --- Paul Bennett wrote, in reply to Eugene: > > > I actually started thinking about this principle > > around a year or so ago. I gave up when I > > couldn't figure out what the minimal atomic > > units of linguistic knowledge should be[*], but > > I did also envisage extending the system to > > allow it to attempt to determine cognates, and > > plausibly build a tree of relatedness given > > semantically identical [**] corpora in a set of > > languages. They seem to be based on a > > generalization of the same problem. > > > > [*]Heck, if I knew *that*, I could put Chmosky > > out of business... ;-)
;-) [**] I have a problem with that. How do you determine that two segments in different languages are "semantically identical"?
> ------------------------------
Gary replied:
> And therein lies the difference between a scientist > and an engineer. The scientist begins by trying to > _understand_ the problem and the engineer begins by > trying to _solve_ the problem. I think many problems > ultimately get solved by engineers who don't fully > understand the problem to begin with, but manage to > find something that works anyway. > > I hate to confess my ignorance, but I really don't > understand what your reply was saying. I'm probably > all wet, but I have a hunch that the less I know about > formal linguistics the better chance I'll have to > solve the problem. After all the average 3-year-old > manages to speak and understand the language with no > formal lingusitics knowledge at all. My pattern > matching is an attempt to simulate by computer what I > think the human child does; match the incoming data > stream to patterns previously encountered.
I think I probably agree ... it's a brute-force method of probable statistical inference. Yep, you've got to hand it to 'em, those three-year old naked apes are pretty darned sophisticated!
> For example, no amount of purely linguistic knowledge > will help you to parse "Join me in a song." Is "song" > an enterable object? How do I go "into" a song? The > fact is, it's an idiomatic phrase that should NOT be > "parsed" at all, but merely recognized by simply > looking it up in a list of patterns.
Love it! When you work it out, join me in a drink. ;-)
> The same goes for a huge number of (possibly > parameterized, and certainly nestable) fixed-format > chunks that should never be "parsed", but simply > looked up in a table of patterns. > > For example, consider these non-parsable fragments: > "fix you up with...", "as far as I know...", "Tell you > what.", "hard to come by", "What makes you think [I > need your help?]", "His head shot up.", "How would you > like to...", "threw his arms around...", "You're out > of luck.", "at the top of his lungs", "I'll walk you > through it.", "spend the day...", "chewing the fat", > "nosing around", "check things out", "call it a day", > "ears perked up", "accent so thick...", "came back to > my senses.", "what's eating you?", "I turned her > down", "take it easy", "shelling out", "hang around", > "one of those days.", "Who cares [what he thinks?]", > "slow on the uptake", "what's going on?", "he hung up > on me", etc. etc.
Nice list. Of course, many of these *are* parsable once certain elided elements are restored - if only one knew how ... others are simply metaphors, whose content is transparent enough, in that they match usage of similar words in other patterns, eg "spend the day" is analogous to "spend your pay" - and it could never be "buy the day", could it?
> I honestly believe that language aquisition consists > of one hundred percent pattern learning and pattern > matching and zero-point-zero percent rules. "Rules" > are just a method of abstracting and cataloging > patterns. Enumerate all the patterns and you don't > need any rules.
The only rules we really need are like this one: "Nice kids don't say f&*k! (Slap)" They are rules that ban specific utterances or regularised derivations. An example of the latter, when a parent or other older person suggests a better usage: "He hitted me in the head!" "Aw, he hit you, did he?" Interestingly, little kids can produce these "over-regularised" utterances, by some kind of inductive reasoning; and anyone familiar with the regular pattern will understand them. Regards, Yahya -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.9.4/375 - Release Date: 25/6/06

Reply

Gary Shannon <fiziwig@...>