Re: OT: Google Fight!
|From:||Dirk Elzinga <dirk.elzinga@...>|
|Date:||Saturday, September 10, 2005, 13:45|
Yes, doing this sort of thing with Google Fight is cute, but I've been
able to use the idea for serious research. I recently completed an
article on English adjective comparison which investigates the choice
between synthetic comparatives ('sillier') and analytical comparative
constructions ('more silly'). The usual rules do a fairly good job,
but there is variability that isn't/can't be explained in the accounts
I've seen. So I applied an explicit and computationally implemented
theory of analogy to the problem. But first, I needed to find the
approximate ratios of synthetic/analytical comparatives in use. So I
collected a list of almost 500 adjectives which were involved in a
comparative construction of some kind. I then made two lists: one
containing items of the form 'ADJ-er', the other containing items of
the form 'more ADJ', and submitted them to Google for head-to-head
comparisons. I then used the results to assign outcomes to each
adjective. So for example, the analytical comparative construction
'more quiet' received 114,000 hits, while the comparative adjective
'quieter' received 31,400 hits. So the analytical outcome was assigned
the adjective 'quiet'. I then used these outcomes in a software
simulation and found that the computationally implemented analogical
mechanism agreed with the Google results 92.1% of the time, which is
better (slightly) than rule-based accounts. What is interesting
though, is that the analogy algorithm also assigns probabilities to
the outcomes which mirror actual probabilities seen in the Google
searches (recall that while 'more quiet' was more common, 'quieter'
did also receive a sizeable number of hits, so the choice of
comparative constructions is inherently probabilistic).
The article is forthcoming in the journal Lingua (probably early next
year), so you'll all be able to see exactly what I found and why I
think it's significant.
On 9/10/05, David J. Peterson <dedalvs@...> wrote:
Gmail Warning: Watch the reply-to!
> Okay, so this is really stupid, but entertaining, nonetheless.
> There's a website out there called Google Fight (googlefight.com),
> and all it does is take two words or strings and sees which
> one gets more hits on Google. So there are lots of fights you'd
> expect (tastes great vs. less filling; republicans vs. democrats;
> Coke vs. Pepsi, etc.) and also some odd ones (10.15 vs. 10.17).
> So I decided to pit conlangs and auxlangs against each other,
> for funsies. The results:
> conlang: 1,210,000 results
> auxlang: 67,000
> That's a huge margin. Anyway, some others:
> artlang: 122,000
> engelang: 523
> lostlang: 120
> model language: 122,000
> planned language: 16,600
> artificial language: 386
> created language: 18,800
> constructed language: 71,700
> conlanger: 11,600
> Also, of my languages, apparently Kamakawi gets the most
> mentions--almost twice as many as the runner up. Epiq gets
> 52,300, but I think that's probably for some other reason.
> Sathir gets a bunch of hits because of Everquest, apparently.
> Anyway, this barely warrants a message. I do think Google Fight
> is cute, though.
> "A male love inevivi i'ala'i oku i ue pokulu'ume o heki a."
> "No eternal reward will forgive us now for wasting the dawn."
> -Jim Morrison