Theiling Online    Sitemap    Conlang Mailing List HQ   

Data mining // was Swearing

From:Adrian Morgan <morg0072@...>
Date:Monday, July 1, 2002, 1:04
John Cowan wrote, quoting Frank George Valoczy and respectively myself:

> > > I have a Data Mining And Knowledge Discovery exam on Tuesday.) > > > > What the bloody hell is that? > > It's how the fuck you find out what you need to know when those > fucking idiots at the datacenter have stored in it some fucking > excuse for a fucking database that isn't organized the fucking way > you need it to be. With lots of jargon.
Well, no. It isn't in any sense about compensating for bad data organisation (it as likely as not combines data from various different organisations) but rather for enormous data *quantity*. It's about automating the search for unexpected patterns in very large data sets, a field that combines statistics, high power computing, artificial intelligence, visualisation, and database technology. An important word here is "unexpected". Traditional statistical methods involve hypothesising a pattern and then checking it against the data, but data mining can find patterns no-one ever thought of looking for. The historic result that made data mining famous was the discovery that beer and nappies appeared together on a disproportionate number of shopping transactions, which no-one ever guessed until the computer said so. It's an active research area, and the complexity of rules that can be unearthed is increasing. Plenty of applications for linguists, although the most exciting ones are awaiting improvements to multimedia technology and the like. But you can imagine, for example, data mining the languages of the world and finding that (oh let's go for a ridiculous example) people living in mountainous terrain have a higher ratio of rounded vowels. Adrian.