Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Zipfs Law may be a statistical artifact

From:Yahya Abdal-Aziz <yahya@...>
Date:Sunday, September 24, 2006, 13:02
Hi, Conlang List members, and other friends,

On Sat, 23 Sep 2006, John Chalmers wrote:
> See the argument, rather far down the page, at http://longtail.com/
------------------------------ Philip Newton replied:
> I expect you're referring to this blog entry: > http://www.longtail.com/the_long_tail/2006/09/is_zipfs_law_ju.html
------------------------------ Hi John and others, Yes, that's right: the frequency of words with exponentially- distributed lengths is *necessarily* distributed according to a powerlaw. This follows from the maths. Funnily, I thought this was already understood by statisticians (if not by most linguists). It is absolutely *not* necessary to assume that words show "preferential attachment" - that is, that some pairs of words occur more frequently than random association would predict. Just statistics; not linguistics! The exact post is easily found at the address: http://tinyurl.com/o9jgj Read the blog; you'll see the simple simulation experiment ("Monkey Language") that Chris Anderson ran to demonstrate the plausibility of Wentian Li being right. Choose letters randomly, with replacement, from an alphabet that includes an end-of-word marker ("space"). Then the lengths of words (that is, of runs of draws between spaces) fall off exponentially, and the frequency with which any word occurs falls off as a power of its length. --- BTW, John, thanks for introducing me to "The Long Tail", of which I hadn't heard before. Looks like a useful weapon for anyone wanting to market their creations on the Web. It gives insights into ways in which the classical 80-20 rule, or Pareto Principle, fails to apply in a wired world. Essentially, the cost of keeping product "in stock" has become almost negligible; Anderson claims that "the future of business is selling less of more". So if any conworld creator wants to sell stories about their created places, it makes sense to sell instalments rather than books. Come to think of it, this replicates the first wave of mass publication, that enabled the rise of Charles Dickens as a widely read story-teller, writing serialised novels. And the graphic novelist may soon sell individual *frames* at a penny a pop, rather than asking readers to plonk down $10 or a whole story, or $30 for a collection! Maybe I'll start selling my poems by the word! ;-) Regards, Yahya -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.405 / Virus Database: 268.12.8/455 - Release Date: 22/9/06