Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Hyperlinking a dictionary to a corpus

From:Arthaey Angosii <arthaey@...>
Date:Saturday, December 3, 2005, 8:45
Emaelivpeith Carsten Becker:
> How does the > program know, though, how to assemble example sentences? > I.e. AFAIU your Elomi is isolating and rather simplistic > (at least it seems so at first sight), but what about > more complex agglutinating or even inflecting languages? > > To link words in example sentences, you'd need a script that > would split at *morpheme* boundaries and when the respective > morpheme already exists in the dictionary, a link to this > one is provided. Well, but I quite don't know how to tell a > program how to split on morpheme boundaries the way the > sentence is meant
Asha'ille is agglutinating, so I had a similar problem to what you describe. I decided not to solve it programmatically, but rather to add some minimal markup to my source text. For example, the "word" |riyëvjosöte| ((which means "Can you understand?") consists of 4 morphemes and 2 ablauts (dunno if those count as their own morphemes). To make a computer-generated interlinear out of that agglutination (hehe, I like that word :P ), I mark up the text I feed into the program: Riy[e]{ë}v|["]|[-]j[-]|[-]o|[-]s[ó]{ö}te|["]| Now, that looks pretty nasty, but then, a program that could easily sort it all out 100% of the time would also look pretty nasty. ;) I use brackets "[]" to write how the morpheme exists in the dictionary, ignoring any surface changes. I use braces "{}" to write surface changes that do not show up as such in the dictionary. I use pipes "|" to mark morpheme boundaries. So, the above breaks down into: Riy[e]{ë}v| -- looked up as "riyev", displayed in the interlinear as "riyëv" ["]| -- looked up as ", not displayed in the interlinear [-]j[-]| -- looked up as "-j-", displayed as just "j" [-]o| -- looked up as "-o", displayed as "o" [-]s[ó]{ö}te| -- looked up as "-sóte", displayed as "söte" ["]| -- same as previous ["] I think that Asha'ille is just simple enough, even though agglutinating, that I could have written a program to figure it out without the manual markup. But I had thought at the time that others might want to use my script, or that I might come up with another, more complicated language. So I stay with my manual markup. :) -- AA http://conlang.arthaey.com/