Re: Hyperlinking a dictionary to a corpus
From: | Arthaey Angosii <arthaey@...> |
Date: | Saturday, December 3, 2005, 8:45 |
Emaelivpeith Carsten Becker:
> How does the
> program know, though, how to assemble example sentences?
> I.e. AFAIU your Elomi is isolating and rather simplistic
> (at least it seems so at first sight), but what about
> more complex agglutinating or even inflecting languages?
>
> To link words in example sentences, you'd need a script that
> would split at *morpheme* boundaries and when the respective
> morpheme already exists in the dictionary, a link to this
> one is provided. Well, but I quite don't know how to tell a
> program how to split on morpheme boundaries the way the
> sentence is meant
Asha'ille is agglutinating, so I had a similar problem to what you
describe. I decided not to solve it programmatically, but rather to
add some minimal markup to my source text.
For example, the "word" |riyëvjosöte| ((which means "Can you
understand?") consists of 4 morphemes and 2 ablauts (dunno if those
count as their own morphemes). To make a computer-generated
interlinear out of that agglutination (hehe, I like that word :P ), I
mark up the text I feed into the program:
Riy[e]{ë}v|["]|[-]j[-]|[-]o|[-]s[ó]{ö}te|["]|
Now, that looks pretty nasty, but then, a program that could easily
sort it all out 100% of the time would also look pretty nasty. ;)
I use brackets "[]" to write how the morpheme exists in the
dictionary, ignoring any surface changes. I use braces "{}" to write
surface changes that do not show up as such in the dictionary. I use
pipes "|" to mark morpheme boundaries. So, the above breaks down into:
Riy[e]{ë}v| -- looked up as "riyev", displayed in the interlinear as "riyëv"
["]| -- looked up as ", not displayed in the interlinear
[-]j[-]| -- looked up as "-j-", displayed as just "j"
[-]o| -- looked up as "-o", displayed as "o"
[-]s[ó]{ö}te| -- looked up as "-sóte", displayed as "söte"
["]| -- same as previous ["]
I think that Asha'ille is just simple enough, even though
agglutinating, that I could have written a program to figure it out
without the manual markup. But I had thought at the time that others
might want to use my script, or that I might come up with another,
more complicated language. So I stay with my manual markup. :)
--
AA
http://conlang.arthaey.com/