Re: Programmers requested for dictionary
From: | Boudewijn Rempt <bsarempt@...> |
Date: | Friday, October 27, 2000, 19:03 |
On Fri, 27 Oct 2000, Peter Clark wrote:
> This is a cry for help: as my language, Enamyn, grows, so do my
> problems with my dictionary. Paper and pencil just don't cut it. I have
> been looking on the web for a dictionary program, or more accurately,
> for a dictionary creator. So far, no luck.
I have done a very simple dictionary program (Python CGI/MySQL) for the
Valdyan dictionary Irina uses - it doesn't have everything you ask
for, although if you're reasonably adroit with SQL queries, you can get
everything out.
A more ambitious project, which Taliesin has mentioned already, is
Kura: Kura will be able to do most of what you want, and more. It can
already link between texts and lexicon, for instance. So you can click
on a word in the text and find its definition in the dictionary, _and_
see all lines where that word occurs in all texts in that language.
I'll take your requirements, and look at how they fit with the current
snapshot of Kura:
> I know there are several programmers on this list. (I sincerely
> wish I was one of them!) With all this talent, I don't think that it
> would take too long to create a cross-platform dictionary creator and
> reader. (I use Linux, but such a program should work on Windows and
> Macs.) Think of the benefit for the whole list! All we need is for
> several programmers to come forward and lead the project.
It's Linux only, for the moment: the gui needs KDE 1.1.2 (but I'm
converting to cross-platform Qt 2.2.1). You need an MySQL database. I
think it would be foolish to try and construct some multi-user data
storage by hand.
> It wouldn't need a gui at first, although later down the road
> that would be nice. Here's what I would like to see, feel free to add
> your own ideas:
> - Data entry with automatic sorting. Sorting should be by the
> "alphabet" used; for instance, English abcde, Russian abvgde , etc. So
> there would have to be some way to setup the program to understand what
> word is being entered in what language so that it would sort correctly.
> (It should also be able to handle things like sh, th, ch, dzh, etc.)
That's a presentation matter: simply rig the select
so it returns the data in order. That's how Irina's
(http://www.valdyas.org/irina/valdyas/taal/dictionary/index.html) and my
(http://www.valdyas.org/andal/languages/denden/grammar/lexicon.html)
dictionaries are produced. This can be as fancy as your report-writing
abilities ;-).
> - Cross-linked entries. This should be automatic. If I enter a
> word in Enamyn (let's say "vyl" /vl=/), then give it's translation as
> "one, single, alone," then under the English section, "vyl" should be
> listed under those three words. There should also be some way to enter
> phrases, too.
More difficult. There are some provisions for this, but I'm working
on a better solution. Basically, the problem is that there is seldom a
one-to-one mapping.
> - Search
> - Entries would be able to display conjugations, mutations,
> endings, etc. I will lapse into Russian here, since I haven't developed
> Enamyn far enough. Let's say I remember that the genitive plural of
> "djengi" (money) is irregular, but forgot what it was exactly. I should
> be able to type in "money" and learn that it is "deneg."
That's in it, but it can't get out - I haven't done the data-entry and
presentation code, but the logic is there.
> - Ability to show words in "native" fonts (would probably have
> to wait for the gui).
This is more difficult than you might think - and not only because X font
handling isn't as modern as you might like. But it is certainly doable, and
will be in Kura once the conversion to Unicode-aware Python 2 and Qt 2.2.1
is done.
> - For the gui, it would be nice to click on a word in the entry
> and be taken to its definition. This would be handy for such cases like
> "hot," which has three Russian words listed in my dictionary:
> "gorjachij," "zharkij," and "ostrij." By clicking on them, I would learn
> that they mean "hot (solids and liquids)," "hot (air)," and
> "spicey."
Yes, that's basic, that's already available.
> Again, I wish I was a programmer--I've been teaching myself C,
> but haven't gotten very far. (I'm still working on my calendar program
> for the Enamyn calendar.) Next on the list is either perl or python, but
> that won't be for a long while.
I'd skip C and go for Python immediately - at least, if you're going
for results. Very little beats Python for pure productivity - certainly
not C, nor Java, nor C++ nor any of the other language I have done
things in. (On the other hand, C is not as bad as Befunge 98...)
> However, if our motives are just, and our hearts are pure, I am
> sure that this list can create a decent dictionary program.
A basic dictionary program, with a PyQt gui, that does what you ask _now_
(and nothing more) is not more than a few days work with Python. If you take
it as a language-learning exercise, it would be a bit more challenging. Doing
it a language that requires hand-crafted memory management and doesn't play
fair with strings will take weeks.
Boudewijn Rempt | http://www.valdyas.org