--Originally published at Enro Blog
What is data mining.
In data mining, the data is stored electronically and the search is automated— or at least augmented—by computer.
Data mining is defined as the process of discovering patterns in data. The process must be automatic or (more usually) semiautomatic. The patterns discovered must be meaningful in that they lead to some advantage, usually an economic advantage. The data is invariably present in substantial quantities.
How are the patterns expressed? Useful patterns allow us to make nontrivial predictions on new data. There are two extremes for the expression of a pattern:
- as a black box whose innards are effectively incomprehensible and as a
- transparent box whose construction reveals the structure of the pattern.
Such patterns we call structural because they capture the decision structure in an explicit way
Machine learning
Things learn when they change their behavior in a way that makes them perform better in the future.
domain knowledge
Market basket analysis is the use of association techniques to find groups of items that tend to occur together in transactions, typically supermarket checkout data.
What’s the difference between machine learning and statistics? Cynics, looking wryly at the explosion of commercial interest (and hype) in this area, equate data mining to statistics plus marketing. In truth, you should not look for a dividing line between machine learning and statistics because there is a continuum—and a multidimensional one at that—of data analysis techniques. Some derive from the skills taught in standard statistics courses, and others are more closely associated with the kind of machine learning that has arisen out of computer science. Historically, the two sides have had rather different traditions. If forced to point to a single difference of emphasis, it might be that statistics has been more concerned with testing hypotheses, whereas machine learning has been Continue reading "Introduction to data mining"