Re: Zipfs Law may be a statistical artifact
|From:||Yahya Abdal-Aziz <yahya@...>|
|Date:||Sunday, September 24, 2006, 13:02|
Hi, Conlang List members, and other friends,
On Sat, 23 Sep 2006, John Chalmers wrote:
Philip Newton replied:
Hi John and others,
Yes, that's right: the frequency of words with exponentially-
distributed lengths is *necessarily* distributed according to
a powerlaw. This follows from the maths. Funnily, I thought
this was already understood by statisticians (if not by most
It is absolutely *not* necessary to assume that words show
"preferential attachment" - that is, that some pairs of words
occur more frequently than random association would predict.
Just statistics; not linguistics!
The exact post is easily found at the address:
Read the blog; you'll see the simple simulation experiment
("Monkey Language") that Chris Anderson ran to demonstrate
the plausibility of Wentian Li being right.
Choose letters randomly, with replacement, from an alphabet
that includes an end-of-word marker ("space"). Then the
lengths of words (that is, of runs of draws between spaces)
fall off exponentially, and the frequency with which any word
occurs falls off as a power of its length.
BTW, John, thanks for introducing me to "The Long Tail",
of which I hadn't heard before. Looks like a useful weapon
for anyone wanting to market their creations on the Web.
It gives insights into ways in which the classical 80-20 rule,
or Pareto Principle, fails to apply in a wired world. Essentially,
the cost of keeping product "in stock" has become almost
negligible; Anderson claims that "the future of business is
selling less of more".
So if any conworld creator wants to sell stories about their
created places, it makes sense to sell instalments rather than
books. Come to think of it, this replicates the first wave of
mass publication, that enabled the rise of Charles Dickens as
a widely read story-teller, writing serialised novels.
And the graphic novelist may soon sell individual *frames*
at a penny a pop, rather than asking readers to plonk down
$10 or a whole story, or $30 for a collection!
Maybe I'll start selling my poems by the word! ;-)
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.8/455 - Release Date: 22/9/06