Re: General Purpose Dictionary Generator

Gary Shannon
Tuesday, October 31, 2006
Henrik Theiling

> Hi! > > Gary Shannon writes: > > I have a preliminary design page up for the dictionary generator. > > > > > > Here's what I have so far: > > Although this is preliminary, of course, it looks quite generic > without special focus on conlang lexicons.
> It would be great to demonstrate how you will handle the special needs > for lexicons. Many people have given their ideas about sub-entries, > searching, etc.etc.
I would love to see some specific examples, some "real" data I could work with.
> I am quite generally concerned that the given > mechanism only seems to work for regular isolating or agglutinative > languages. If you wanted to have a lexicon for an inflecting or > irregular language, you'd probably need too many columns to store the > forms -- e.g. Þrjótrunn verbs need ~100 morphological forms stored.
If I look at dictionaries for inflecting languages like Latin and Sanskrit I don't see anything that couldn't be handled by a tabular database. Sanskrit's ten classes of verb could be put into ten different columns, but it would make way more sense to put the verb in a single column and put its class number in a second column. There might be hundreds of conjugational forms, but would you find those in a dictionary, or would those belong in a reference grammar?
> And of course, nouns require a different set of forms stored, so it > might even be infeasible to store the information in the same table as > the main entry -- you probably would not want 100 columns for verbs > and another 12 or so for nouns in the same table.
You probably would also not want to list 100 verb forms in a given single dictionary entry. If I could see some concrete examples of both the input data you'd like to provide and the form of the dictionary you'd like to see from that, then I would have something concrete to look at and design around.
> I think that a conlang lexicon should store all information to > automatically generate any given form for each entry, so all the > irregularities and morphological forms must be stored in a computer > readable way. Will you support inflected or irregular languages in a > way that makes this possible? Or do you think this will be outside > the scope of your application?
I don't see this initial application automatically generating any DATA. What I will try to accomplish to begin with is to take existing lexicon data and format it into a dictionary, or a pair of dictionaries, one for each language, or into a pair of dictionaries and a thesaurus, if the data is there to support a thesaurus. But it won't create any data. As soon as I start generating data then I'm into specifics that might be fine for my conlang, but lousy for everybody else. The tool I have in mind is much too flexible and generalized to be able to include any language-specific functions. However, after that part of the project is completed, and working backwards from the finished form of the dictionary data, I can start to take a closer look at where the data comes from and how it is created and edited. That part of the system would, of course, be more language-specific. --gary
