Re: General Purpose Dictionary Generator
|From:||Gary Shannon <fiziwig@...>|
|Date:||Tuesday, October 31, 2006, 1:55|
--- Henrik Theiling <theiling@...> wrote:
> Gary Shannon writes:
> > I have a preliminary design page up for the dictionary generator.
> > Here's what I have so far: http://fiziwig.com/dictionary/dictionary1.html
> Although this is preliminary, of course, it looks quite generic
> without special focus on conlang lexicons.
> It would be great to demonstrate how you will handle the special needs
> for lexicons. Many people have given their ideas about sub-entries,
> searching, etc.etc.
I would love to see some specific examples, some "real" data I could work with.
> I am quite generally concerned that the given
> mechanism only seems to work for regular isolating or agglutinative
> languages. If you wanted to have a lexicon for an inflecting or
> irregular language, you'd probably need too many columns to store the
> forms -- e.g. Þrjótrunn verbs need ~100 morphological forms stored.
If I look at dictionaries for inflecting languages like Latin and Sanskrit I
don't see anything that couldn't be handled by a tabular database. Sanskrit's
ten classes of verb could be put into ten different columns, but it would make
way more sense to put the verb in a single column and put its class number in a
second column. There might be hundreds of conjugational forms, but would you
find those in a dictionary, or would those belong in a reference grammar?
> And of course, nouns require a different set of forms stored, so it
> might even be infeasible to store the information in the same table as
> the main entry -- you probably would not want 100 columns for verbs
> and another 12 or so for nouns in the same table.
You probably would also not want to list 100 verb forms in a given single
dictionary entry. If I could see some concrete examples of both the input data
you'd like to provide and the form of the dictionary you'd like to see from
that, then I would have something concrete to look at and design around.
> I think that a conlang lexicon should store all information to
> automatically generate any given form for each entry, so all the
> irregularities and morphological forms must be stored in a computer
> readable way. Will you support inflected or irregular languages in a
> way that makes this possible? Or do you think this will be outside
> the scope of your application?
I don't see this initial application automatically generating any DATA. What I
will try to accomplish to begin with is to take existing lexicon data and
format it into a dictionary, or a pair of dictionaries, one for each language,
or into a pair of dictionaries and a thesaurus, if the data is there to support
a thesaurus. But it won't create any data. As soon as I start generating data
then I'm into specifics that might be fine for my conlang, but lousy for
everybody else. The tool I have in mind is much too flexible and generalized to
be able to include any language-specific functions.
However, after that part of the project is completed, and working backwards
from the finished form of the dictionary data, I can start to take a closer
look at where the data comes from and how it is created and edited. That part
of the system would, of course, be more language-specific.