Re: Shareable/centralizable dictionary server software? (WAS: Size of your dictionary)
From: | Sai Emrys <saizai@...> |
Date: | Saturday, April 4, 2009, 3:55 |
Some other ideas (modeling it so as to be able to replace Wiktionary):
* somehow parse Wiktionary pages? (seems like a very hard task given
that style isn't uniform...)
* give multi-user edit ability with particular permissions (e.g. fully
editable; moderated edits; canon vs noncanon; single-source only)
* give sources of contributions (either a user or a standardized cite)
* add versioning at some point
* automagically find homonyms (by phonological distance metric of the
xsampa / ipa) and syn/antonyms (by some sort of cooccurrence /
similarity metric on definitions plus explicitly defined relations)
* definitions have properties (eg vulgar, archaic, obsolete, jargon of
a particular category, dialectical of a particular category)
* definitions (or words? not sure what level is accurater) can have
cross-language links for translations
* some support for compound words (this gets at the whole 'how do you
support morphology' question though...)
* automagically scan a corpus text and link all the forms of the word
present? (seems Hard)
(... obviously most of these are a version 4 spec...)
On Fri, Apr 3, 2009 at 8:47 PM, Alex Fink <000024@...> wrote:
> This is a good start. But what I was getting at is that there's more to an
> etymology than a source word. For one I may want to track the precise
> proto-form, which is more information than just a single lemma in the
> lexicon: maybe my word came from a _particular_ derivational or inflectional
> form of a proto-wordd, maybe it came from a coalescence of two proto-words,
> etc.. For two I may want to remark on particularities of the development
> itself, like irregular sound changes.
Well, one can always have a random text field, but that seems like a kludge.
Supporting morphology is hard, but if we did that, then there'd be no
problem marking some word as being derived from a particular
moprhological form of a particular word (rather than merely the word
generically).
> Two is reasonably addressed by just adjoining another text field. One
> depends on how your model treats forms as opposed to the stems they come
> from -- how does it? (This bears on the irregular forms thing too.)
I don't know. Suggest something.
>>So this would be just <font face="yourspecialfont">foobĂĄr</font>
>>really. Simplest way I could think of supporting it.
>
> Oh fine. That is of course Wrong from the Unicode purists' perspective, though!
Give me a better method. :-P
- Sai