Theiling Online    Sitemap    Conlang Mailing List HQ   

On the design of an ideal language - finally, my (long!) response

From:Sai Emrys <sai@...>
Date:Thursday, June 8, 2006, 8:45
So, this was first posted by And Rosta a month ago. That was in the
middle of finals.

Well, finals are over, and I've had some time to recuperate and such,
so now I can respond. Looks like I've missed a really interesting (and
very-many-page-long) discussion.

I'll try to respond to everything here - writing as I backread the
thread, and (hopefully) condensing it somewhat to be more coherent.

Before I do, here's a URL I had in a draft response:
- as another LJ thread that should be of interest.

But first, I will make a digression to talk a bit meta...

I am glad that ODIL has gotten some interest again. It's probably the
first major thing I wrote, back in '02. As I remember it, I just sat
down one night and wrote most of it at once; I've added  since, but
it's not nearly up to date (the last revision was years ago). As And
and others have pointed out, it is missing some important things, and
I failed to state certain axioms.

To set this up - this'll be important if you're trying to figure out
what I want or why, or what my likely opinion is for any particular
situation - I should explain that, of course, this is only intended to
be a draft of how to create *an* Ideal language... for some purpose.
We all differ in what that purpose is.

My purpose is somewhat abstract. I would like to create the best
language, /cum/ language, for the purposes of human-to-human
communication primarily, and human-to-other secondarily. I am only
marginally concerned with its ease of learning for non-native
speakers. This is because I feel that being concerned with this causes
a very large amount of crippling of the potential of language, through
trying to find the least common denominator of existing languages, and
through trying to make it easy to learn as a second language. (Viz.
the horrors that are [insert target psuedo-international-auxlang

In this sense, I am trying to have what I call a "technically"
optimized language. I think I use the term in a somewhat nonstandard
way, but I can't think of a better (suggestions welcomed). What I mean
is inclusive of aesthetics and artistry. The point is, simply, that I
am imagining it as if it were my only language, accepting at least
weak Sapir-Whorf as true, and trying to answer the question, "What
would be the best language for me to know exclusively, assuming
everyone else knows it too and that it's my native language, to be
able to think, communicate, and operate in as [Sapir-Whorf]
unrestricted, elegant, beautiful, intuitive, low-effort, and
ultimately sublime way as possible?"

That is what I mean by "technical" optimization - judging a language
on the basis of how well it works /as a language/, rather than on such
things as existing-langnuage-based aesthetic, or interoperability, or
etc. I realize that this, even here on CONLANG, is a somewhat taboo
perspective to take when trying to evaluate a language. It is my
opinion that properly answering the question will be difficult, but
require at minimum a very strong cooperation between art, cognintive
psychology, introspection, incoporation of the variety in
thought-qualia (i.e., what it feels like to think - it varies!), etc.
I think it's worth the effort, though, even without necessarily
believing that the goal is reachable within one lifetime, because it
ought to yield ["technically"] better conlangs in the meantime, and a
lot of fascinating results about how we think and how language really
works - or better, how it *could* work.

One tangential point: this sort of language 'ex nihilo' would try NOT
to assume anything about the existing language, corpus, vocabulary, or
etc. However, it's a very difficult issue when it comes to what you
assume about cultural context - it becomes somewhat recursive.  If
you'll pardon a kludge of mathematical terms, try to assume the
eigenvector of a culture that all speaks this language. :-)

As a brief note on that auxlang statement:
It is my opinion that no traditional international auxiliary language
will ever be successful *as* a language, that is, it will not rise to
ascendency by virtue of its technical virtuosity. These will only
emerge on a major scale by the traditional means: cultural and
military conquest. At present, English wins.

However, I do feel that there are three loopholes to this.

First, that of an ODIL-style language that would seek to be
*technically* optimized, i.e. ideal for my purposes as outlined here.
These, I am willing to say, would be useful by doing something that
many linguists would abhor the thought of - namely, by being a better
language. This is one of my two main conlanging interests.

Second, there is room for memes. Chunks of language, grammatical
structures, or other modular pieces that can be fitted into any (or -
less elegant - some particular) language. These stand a decent chance
of working, because they do not depend on the world domination through
exclusion (or  that a traditional auxlang would need - if the meme is
good enough, infectious enough, and useful enough, it can survive.
However, I have not seen any real effort made at conlanging in this
manner, and it is not something I have yet tried to turn my attentions
to. I would imagine that its criteria for success (i.e. what would let
it work) would be significantly different than a normal language, and
would be very interested to see an ODIL-type essay describing what
those would be. Any volunteers?

Third, you have room for full languages that are meta to existing ones
- that is, ones that don't directly compete. This rules out, more or
less, all spoken language (and by extension, all forms of language
that are merely encoding shifts thereof - i.e. nearly all existing
writing systems). An example is, of course, my other great interest: a
NLF2DWS (as discussed at length elsewhere). I feel that it offers the
only option for a complete language, rather than a modular chunk, that
has any real potential of becoming widely used.

The reasons, to put it simply:
a) it lacks competition;
b) it is compatible with existing languages (i.e. it is
spoken-language-agnostic, and I believe but am not certain that a
decent NLF2DWS could have a parsing algorithm to and from any
arbitrary natlang to within a reasonable margin for peculiarities of
semantic encoding [e.g. not having a
position-in-relation-to-that-volcano datum grammaticalized]);
c) it has the rare opportunity to succeed on its technical merits,
i.e. you can make the case that it is worth learning because it has X
Y Z features (like software!), rather than having to overcome the
existing language - and let's face it, even for an extremist like me
it's bloody hard to figure a way to make a traditional spoken /
written-as-spoken language BETTER enough than generic natlangs to be
worth the 'upgrade'.

(Actually, I'm going to create that as a new term henceforth:
tradlangs. I.e. ones that have a more or less normal, natlang-type
structure, are dominated by the spoken language, etc. I'm not sure if
a sign language would be included here; an interesting question,
perhaps, for a spinnoff thread.)

(One other coinage: an ODILlang [named after ODIL, of course] is a
language that aims at what I outlined above, namely technical
perfection in its own right.)

That, I think, gives a good overview of my philosophical backdrop to
this whole enterprise - and incidentally the first coherent account of
why I think a NLF2DWS is the only viable international auxiliary
language I can think of. [If you'd like to discuss that (I know, it's
somewhat dangerous territory), I'd be happy to, but please start a new
thread for it.]

Hopefully that'll make some of the following clearer. Hopefully also
it addresses any of the points raised (e.g. by Jim Henry re AUXLANG)
that I do not specifically respond to below.

I ask your pardon if I've waxed a bit overly philosophical or
idealistic (or dogmatic about what is admittedly my opinion on some,
ah, contentious questions of the nature of language, thought, and

Now on to the actual reading-and-replying....

"8. Principle of Concision.
The language should be as concise as possible *on average*. As a
benchmark, it should be able to achieve the average concision of the
concisest natlang, without compromising the Principle of Desired
Clarity. The rationale for this principle is twofold. (i) It is
generally utilitarian, saving time, space, effort. (ii) Without it,
the Principle of Desired Clarity is fatally undermined: the speaker
should not be forced to opt for vagueness because the desired level of
precision is not worth the effort of the degree of verbosity that
expressing it would entail."

I agree. I think this is a subset of the Principle of Least Effort to
a great extent.

I think it also could be expressed as the Principle of Default Simplicity.

I should point out though that this HUGELY invokes cultural context
(can I get a shout from my homie coglinguists?) - what is a 'complex
idea' is a famously horrific thing to define. (This comes up in
computer science, in AI, algorithm description, etc.)

"9. Principle of Expressiveness.
Everything expressible in a natlang should be expressible in the ideal
lang, with (in the main) no significant loss of concision."


The but is, essentially, the above cultural context qualm - some
languages encode things that may not be desirable to encode in a
ground-up-technically-optimized language, because they are culturally
important to that body of speakers, or because they're just built into
the language over time. Viz., the volcano reference; gendered nouns;
puns; etc.

In principle, though, I agree. Perhaps this can be better phrased in
terms of the "power" of the language, that is, it should be possible
to express anything I want, with its difficulty being in direct
proportion to its distance from the assumed default. (Perhaps this
should be somehow bundled into the Principle of Default Simplicity for

"10. Principle of Variegation
The language should be as textured, variegated and many-flavoured as a
natlang (benchmark: English)."

With this I do not agree. It is a stealth way to say that you want
something that is aesthetically pleasing - but I feel that this is not
the best way to go about it. Variety will come anyway - especially
with all the previous things at work. :-P

[Backediting after reading Jim Henry's reply to this: I DO agree with
wanting a very large degree of suppleness to the language; that is, it
should be able to distinguish among as many shades of meaning as are
culturally relevant to distinguish.]

"The principle of Semantic Density, I hold to but with certain caveats.
(i) Speech and writing is primary, and the Principle of Cross Modality
must be respected. Therefore the expressive resources of, say, graphic
and gestural mediums are underutilized."

To the contrary (if I understood you correctly).

My intention for the PCM is that it not be a least common denominator
at all. Perhaps better put, what I want is for each mode to be used to
its fullest within itself (and each combination of modes likewise, as
a separate entity - e.g. simultaneous speech and sign would have
markedly different characteristics when optimized than either alone).
They should certainly all be able to express each others' ideas, but
the complexity thereof may vary significantly with the mode. E.g.,
signing is inherently better at describing fine nuance of movement in
physical space than is speech; doing an equally detailed description
in speech may well take more space, and that is fine.

Jim Henry:
"> 0. Principle of Good Representation
> "All forms of language use should be as representative as possible of the actual > thinking of the target population."
One corollary of this might be that an engelang or auxlang intended for spoken use shouldn't violate known or strongly suspected universals, even statistical universals. But of course an ideal language for investigating the nature of the human language faculty needn't be bound to this..." Universals certainly need to be considered for an ODILlang, in each applicable mode. However, they also have to be put to the question of whether (how?) they help to fulfill the ODIL goals, and if not, they must be discarded. Note, of course, that Principle 0 (Good Representation) subsumes human psychology, so if a universal is somehow truly inherent (cue Chomsky), it should be respected. But I think that's a very hard test for a universal, frankly, when compared against explanation through mere happenstance. "> "Any medium used [...] should be used optimally." 'Optimally' means in accordance with the other principles and such that "all available mediums are used to their fullest potential". Sai seems to be saying (correct me if I'm misreading you, Sai) that the language should, in its most concise mode, use almost every possible word within its phonotactic limitations; and it should use this concise mode in non-noisy conditions, and a less concise mode with redundancy in noisier conditions. Actually, I suspect that even in the least-noisy real-world conditions you would still need a lot more redundancy than Sai seems to allow for (he appears to throw out a ballpark figure of 1% of unused space). " I think that's a fair reading. And yes, 1% is totally ballpark - though that reserve is mainly meant for other purposes (future vocabulary creation, for example), rather than noise-proofing. One choice you get to make here is whether to somehow build a noise-proofness-variable system (concise and easily lossy vs nonconcise but robust, as desired at runtime), or build it on some default assumption - presumably based on the 'average' real-world factors for the culture in question. (E.g. do they live in a really noisy place, or only in monasteries? Do people pay attention to each other very carefully, or is that another source of "noise"?) You are right in saying that, if opting for a variable system, its concise end would be very non-robust. However, one would also want it to fail gracefully; that is, for mistakes to either be obvious as mistakes (by virtue of pragmatics), or to degrade into less subtly nuanced but still acceptable versions. FWIW, these two coping strategies would probably be polar opposites in terms of vocabulary generation algorithm... "> 5. Principle of Iconicity
> The form of the utterance should resemble the meaning.
Sai seems to be suggesting use of phonaesthemes in devising or selecting vocabulary, if I understand correctly. Otherwise, I'm not sure what this means at levels above the lexical." Again, that's correct. And I know it's one of those things that's been done many times before, nearly all of them amazingly awful. (Viz., almost everyone described by Eco or Yageullo...) This applies somewhat better when you are using modes that can more closely resemble their actual real-world referents - e.g. I think the ASL equivalent of "the guy stumbled around drunkenly and walked into a tree and fell on his ass" is pretty damn iconic. A side note for clarity here: What I am seeking is "translucent" iconicity, not necessarily "transparent" (terms my own again). For examples - in ASL, "eat" is transparent; "tree" translucent, and "home" opaque. Translucent examples are intuitive once you know what they are, as it were. I do not insist that the language be transparent to a naive observer; I'm ambivalent both on whether that's possible and whether it's desirable. "I'll note that neither of you mentioned one criterion: ease of learning. Indeed, Sai's implied cluster of several languages (at least one per mode plus variations with more or less redundancy for more or less noisy conditions) would probably be as difficult as learning, if not a whole family of natural languages, at least several regular conlangs. And you both seem to imply a complex phonology that would be difficult for adult learners of many native languages, and (perhaps) a large root vocabulary that would take a long while for anyone to learn." As implied in the meta talk above, the only ease of learning I care about here is that of the native speaker. I think the existence of spelling bees is a HORRIBLE UNHOLY PERVERSENESS, and would not want anything of the sort to exist in an ODILlang. Likewise, one should consider how long it takes an average person to learn (from birth), and ditto for non-average people - e.g. how much do various impedements make things difficult? Deafness? Cleft palate? Lisp? Missing limb? This is where you would get the cap, if any, on extreme phonological diversity. E.g. (to pick on John Q) if Ithkuil were a L1, how many of its speakers would have difficulty with it? Which ones? For how long? How much would it, in a linguistically isolated environment over many years / centuries, inevitably (??) degrade that phonological complexity and thus need to come under the Principle of Least Effort? What I care relatively little about, and would ask to be carefully distinguished, is the ease of learning for current speakers of current tradlangs. And again: "This sounds like an additional principle, a Principle of Redundancy, which might be split into two, a Principle of Noise Resistance, and a Principle of Lacuna Resistance, the latter having to do with how much can be unambiguously recovered from a fragmentary text." I definitely should add that, as it is implied but not stated in ODIL as is. However, I am not certain that redundancy is the only (or optimum) means by which to provide noise resistance - as implied above where I was talking about graceful degredation, I would like to see a variety of means used (phonotactics, phonetics, self-segregating structure a la RAM's LSMTI, pragmatic context, etc). "By default, but in fact a lot of conlangers aim to have the smallest possible vocabulary size, or at least the smallest possible inventory of roots. The Principle of Oligosynthesis, you might call it, at a stretch. If you adopt the Principle of Oligosynthesis, then the Principle of Expressiveness is at minimum subordinated to it." This, I do not see any a priori purpose to, though I could see it as maybe coming about from the others. A small core vocabulary, though, seems to make the mistake (to shift into computer-science terms again) of optimizing for design rather than for outcome, which is almost always a very bad idea. I would like to keep the core Principles as directly relevant to the *use* of the language, or its psychological reality, as possible, and have the design itself (this, redundancy, etc) be a question of implementation instead. That said, perhaps a small root inventory would be useful? I'm not really sure, to be honest; the attempts I've seen seem to bring about various undesirable side effects as a result, though (e.g. ambiguity, arbitrary or false-sounding compounds, etc). "To me the principle has mainly to do with wordshapes, but it also means that there should not be high frequency collocations like "of the", or "in the event of", and so forth. And in syntax, either there should be a rich array of constructions or the syntax should be so minimal and flexible as to be unobtrusive and not affect the degree of tonal homogeneity a text has." Bingo. Again with the CS reference: I would like language to operate on the Don't Repeat Yourself (DRY) principle. Or to put it in even geekier terms, it should have as high an entropy as possible. Entropy, after all, is probably the best measure of real content. Actually, that's probably worth formualting into a principle: Principle of Entropy, a.k.a. Principle of High Signal:Noise Ratio (SNR) The language should have as high an entropy as possible, as a weighted average over all likely contexts, conversations, and soliloquies. This would of course be in opposition to the Principle of Noise Resistance (I'll take the lacuna version as a subversion thereof). Note that it is better to formulate this opposition than one vs. a hypothetical Principle of Redundancy - the latter is more directly contrastive in a nonproductive way, whereas this PE vs PNR balancing would probably result in something more to the point of what we really want. And, responding to Jim Henry: "4. At least one incarnation of Rick Morneau's conlang (of perpetually changing name) had a scheme in which stems are composed of two morphemes, an initial morpheme (IM) and a final morpheme (FM) [this is all a reconstruction from vague memory]. Once IM34+FM73 has occurred in the discourse, IM34 when not followed by a FM is equivalent to IM34+FM73. This is a neat idea, and I would have appropriated it for Livagian, were it not that other principles of Livagian's design demand that the phonological shape of stems should be unconstrained." This reminds me of something I wrote about elsewhere, for one likely way to implement simultaneous PE and PNR optimization: Have some sort of affix or morpheme that describes the subject area of the conversation, and the rest of it would be different in meaning depending on the subject area. If you're having a cross-subject conversation, these can be specified, but if not they can be dropped as redundant. I think what this might be equivalent to is continually talking in metaphors. This assumes, of course, that different subjects have a significant overlap of metaphorically close-enough-to-use-the-same-root words, but could lead to very interesting conversations, and indeed to possibly (almost-explicit) very many-layered strata of meaning. Yahya quoting Jackson Moore: " I am planning an essentially impractical limit-
> case language that incorporates the full range of grammatical meaning > found in natural languages with as much phonological consistency and > specialization as possible, but whose lexical meaning is entirely > evacuated - thus, "the dog bit a man" and "the bear licked a boy" > will be phonologically identical, and unambiguously denote no more > than "before now, a specific animate/animal agent performed and > completed a discrete action upon a non-specific animate/human/male > patient" et cetera. Quite dysfunctional, maybe good for > charades...except that I'm not designing the language for fluent > speakers, but for non-fluent listeners, the idea being that > grammatical paradigms and the relations between them will be as > acoustically salient as possible - will be 'transduced' to sound with > minimum interference. The only thing that will distinguish it from > any conlang with generic roots is that in this case any given portion > of phonological space will be monopolized by a single grammatical > device (that and the fact that the 'channel' will be an orchestra, > not a mouth, making the language purely textual/non-extemporaneous).
Oh, good! The much proclaimed, but never explicated, "language of music" made actual! ;-) I can't wait to hear the *canonical* version of "The Sleeping Beauty"!" Me too! Orchestras (or better, operas), FWIW, seem an excellent opportunity for doing some sort of multi-person-modal language. :-) Or more practically, what if you used a conversation - namely, the fact that two people are talking about one subject at the same time - as a mode, and try to optimize on that? Turn-taking works, but it's really rather crude as a way to run an exchange of ideas. I'd like to figure out some way to optimize that better, so that the most productive conversations can be most quickly had, rather than the traditional sort of you say / I say thing. (Note here that I'm mainly talking about real-time conversations, not so much [though maybe also!] the call-and-response sort of conversation that we're having here on the list, where there is no opportunity to weave into what is being said by the other person AS it is being said.) Now, on to the thread "Impossible Gibberish (was Re: On the design of an ideal language)"... I'll point to my response on that thread as being a good answer to the topic itself, i.e. what I think of having everything mean SOMEthing and not having "incorrect" productions. Jim Henry: "....I also note that neither Sai nor And includes self-segregating morphology in their criteria for an ideal language. In And's case, he has something else that's just as good in disambiguating a parse string for a fluent speaker, though not as helpful for a learner as self-segregating morphology would be. In Sai's case I suppose that self-segregating morphology would be too great a constraint on filling up the phonological space with real words, perhaps?" I don't include it because, as stated above, I am trying to avoid lifting implementation to the level of design spec. This is also why I am against using redundancy as a spec; it's only *an* implementation. SSM is certainly a viable option for accomplishing some of them, though. ... there. I think that covers everything. And of course, this response is probably several times larger than the referent essay. But that's normal I guess. :o) Congratulations for finishing reading this email. ;-) Hopefully I haven't made too many errors... - Sai


Sai Emrys <sai@...>