“No data yet,” he answered. “It is a capital mistake to theorize before you have all the evidence. It biases the judgment.” Sherlock Holmes - A Study in Scarlet “I have no data yet. It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts." Sherlock Holmes - A Scandal in Bohemia "Data Mining", "fishing", "grubbing", "number crunching". These are the value-laden terms we use to disparage each other's empirical work []. A less provocative description would be "specification-searching", and a catch-all definition is "the data-dependent process of selecting a statistical model".[] The fact that specification searching invalidates the traditional models of statistical inference is implicit in the pejorative content of the word "fishing", but the industrious implication of the word "mining" suggests that the activity may, in fact, be productive. Although "fishing" too may seem to be a productive activity, the terms is usually used in a derogatory way to indicate both the fisherman's great uncertainty over the quantity and quality of fish that might appear in his net and his willingness to accept anything that shows up. Mining, in contrast, is an activity intending to bring to the surface a specific valuable commodity whose existence is likely to be relatively well-established before mining commences. This book is about "data mining". It describes how specification searches can be legitimately used to bring to the surface the nuggests of truth that may be buried in a data set. The essential ingredients are judgment and purpose, which jointly determine where in a data set one ought to be digging and also which stones are gems and which are rocks. Without judgment and purpose, a specification search is merely a fishing expidition, and the product of the search will have a value that is difficult or impossible to assess. Edward Leamer, Specification Searches, Wiley, 1978. Spanos (2000) argues that the reason why certain types of data mining are undesirable is that they mean that theories do not undergo a sufficiently severe test. Data are performing a double duty, of leading the investigator to a claim and then providing evidence in favour of that claim. He uses the analogy of shooting at a blank wall and then drawing a bull's eye round the bullet hole. Such practice teaches us nothing about the skill of the person shooting at the wall, for the probability of the shot being in the bull's eye is one. Introduction: is data mining a methodological problem? Roger E. Backhouse and Mary S. Morgan Journal of Economic Methodology 7(2), pp. 171-181, 2000.