Theiling Online    Sitemap    Conlang Mailing List HQ    Attic   

Re: Shareable/centralizable dictionary server software? (WAS: Size of your dictionary)

From:Sai Emrys <saizai@...>
Date:Saturday, April 4, 2009, 3:55
Some other ideas (modeling it so as to be able to replace Wiktionary):

* somehow parse Wiktionary pages? (seems like a very hard task given
that style isn't uniform...)
* give multi-user edit ability with particular permissions (e.g. fully
editable; moderated edits; canon vs noncanon; single-source only)
* give sources of contributions (either a user or a standardized cite)
* add versioning at some point
* automagically find homonyms (by phonological distance metric of the
xsampa / ipa) and syn/antonyms (by some sort of cooccurrence /
similarity metric on definitions plus explicitly defined relations)
* definitions have properties (eg vulgar, archaic, obsolete, jargon of
a particular category, dialectical of a particular category)
* definitions (or words? not sure what level is accurater) can have
cross-language links for translations
* some support for compound words (this gets at the whole 'how do you
support morphology' question though...)
* automagically scan a corpus text and link all the forms of the word
present? (seems Hard)

(... obviously most of these are a version 4 spec...)

On Fri, Apr 3, 2009 at 8:47 PM, Alex Fink <000024@...> wrote:
> This is a good start. But what I was getting at is that there's more to an > etymology than a source word. For one I may want to track the precise > proto-form, which is more information than just a single lemma in the > lexicon: maybe my word came from a _particular_ derivational or inflectional > form of a proto-wordd, maybe it came from a coalescence of two proto-words, > etc.. For two I may want to remark on particularities of the development > itself, like irregular sound changes.
Well, one can always have a random text field, but that seems like a kludge. Supporting morphology is hard, but if we did that, then there'd be no problem marking some word as being derived from a particular moprhological form of a particular word (rather than merely the word generically).
> Two is reasonably addressed by just adjoining another text field. One > depends on how your model treats forms as opposed to the stems they come > from -- how does it? (This bears on the irregular forms thing too.)
I don't know. Suggest something.
>>So this would be just <font face="yourspecialfont">foobĂĄr</font> >>really. Simplest way I could think of supporting it. > > Oh fine. That is of course Wrong from the Unicode purists' perspective, though!
Give me a better method. :-P - Sai