Re: Universal Language Dictionary revision
From: | rick@harrison.net <rick@...> |
Date: | Sunday, December 10, 2006, 17:08 |
Herman, thanks for your comments. In my future "Hdict" project I'm
going to have each item explicitly marked with its antonyms or other
words that might be regularly derived from it. So "like" (the
preposition) would be tagged with a link to "similar" (the adjective),
and so forth. Concepts that can be expressed as compounds of other
concepts will be tagged also; "volcano" will have a note indicating
that some natlangs express it as a compound of "fire" plus "mountain."
These links will make automatic vocabulary generation easier, so
everybody can have a conlang - or a thousand conlangs.
I guess some pronouns can be added to ULD. Some natlangs don't have
free-standing pronouns but I think all of the languages currently
included in ULD do have them. Not sure about Tsolyani; don't have any
literature about that language in my collection.
Hello, goodbye, thank you and so forth -- these would be tricky in
some languages because you might have to choose from many options
depending on your gender, social status, time of day and so forth.
There aren't enough comment fields or room for annotations in the
current ULD data structure, but I will try to think of an elegant way
to add this kind of material to the future Hdict project.
(I've been told that in Japanese factories which operate 24 hours a
day, workers greet each other with "ohayou" [good morning] at all hours
of the day and night because "konnichiwa" [good afternoon] and
"kombanwa" [good evening] are felt to be too formal to use with co-
workers.)
Personally when I look at basic vocabulary wordlists, one lack that I
see is interjections and discursive flavor words. On wikipedia there's
a list of the 1000 most frequent words in TV shows and movies,
apparently made by analyzing the text taken from closed captioning data
embedded in the video. This is interesting because it approximates the
way English is actually spoken by real people. The list includes: oh,
yeah, okay, uh, huh, hey, hell, um, hmm, ah, damn, ha, whoa, wow,
alright, mm, sh_t, f_ck, ooh, y'know, ow, and mmm. I was surprised that
"oops" did not make the list.
How about the ULD's transition to XML and UTF-8? Pretty exciting,
isn't it. I was determined to resist XML because it's so damn ugly, but
then I thought, what the hell, I'll do something trendy for a change.
Replies