Data mining // was Swearing
|From:||Adrian Morgan <morg0072@...>|
|Date:||Monday, July 1, 2002, 1:04|
John Cowan wrote, quoting Frank George Valoczy and respectively myself:
> > > I have a Data Mining And Knowledge Discovery exam on Tuesday.)
> > What the bloody hell is that?
> It's how the fuck you find out what you need to know when those
> fucking idiots at the datacenter have stored in it some fucking
> excuse for a fucking database that isn't organized the fucking way
> you need it to be. With lots of jargon.
Well, no. It isn't in any sense about compensating for bad data
organisation (it as likely as not combines data from various different
organisations) but rather for enormous data *quantity*. It's about
automating the search for unexpected patterns in very large data sets,
a field that combines statistics, high power computing, artificial
intelligence, visualisation, and database technology.
An important word here is "unexpected". Traditional statistical methods
involve hypothesising a pattern and then checking it against the data,
but data mining can find patterns no-one ever thought of looking for.
The historic result that made data mining famous was the discovery that
beer and nappies appeared together on a disproportionate number of
shopping transactions, which no-one ever guessed until the computer
said so. It's an active research area, and the complexity of rules that
can be unearthed is increasing.
Plenty of applications for linguists, although the most exciting ones
are awaiting improvements to multimedia technology and the like. But
you can imagine, for example, data mining the languages of the world
and finding that (oh let's go for a ridiculous example) people living
in mountainous terrain have a higher ratio of rounded vowels.