Re: Dictionary formats
From: | <kam@...> |
Date: | Sunday, March 31, 2002, 14:24 |
On Sat, Mar 30, 2002 at 10:55:07PM -0600, Peter Clark wrote:
> If I may hijack this thread (sorry), I would like to make a general appeal
> to all the programmers on this list: please, someone, anyone, write a
> dictionary-making program. It can even be text-based (heck, I run Linux, the
> command line does not frighten the likes of me), just so long as it allows
> for multiple semantic ranges that works both ways and automatically handles
> word entry and sorting. Or tell me where I can find such a program. Please!
> I'm can't imagine how I'm going to handle a dictionary once I get over a
> couple hundred words.
> :Peter
At a very basic level you can do a fair bit with grep, sort and cut,
read their man pages.
I'm writing bits of software to produce concordances that are sort of
dictionaries, but mainly concerned with how a given word appears in the
various texts, the nice thing is that with html you can put in links to
the actual lines in the texts. At a later stage I'll probably add an option
for producing something more like a normal dictionary. At present words
are grouped by "headwords" (ie all the parts of "to see" appear together)
but the headwords themselves don't yet appear. The idea is to have a fairly
simple database that can if need be be edited in a text editor, and then
write the software to automatically produce plain text, html and latex
output. I've just been writing code to allow for notes in the text, and
need to figure out the best way to add notes on dictionary entries. One
advantage at present is that database entries for a particular text or
word or grammatical form etc can be extracted with grep. This is all
work in progress (and I tend to work in fits and starts) and the code is
fairly spaghettoid at this point, for some sample output see
http://home.clara.net/carrot/kernmss/tgk
(Sorry about the Christian content, the historical Cornish texts I'm
working on are about 95% religious, such was the stranglehold the church
had on thought and literacy)
I'd be interested to know more clearly what it is you need, in particular
what sort of input you'll have, text, word lists???
Keith Mylchreest
NB written Cornish has a fairly clear idea about what a 'word' is, this
is by no means the case for all languages.