Knowledge Discovery
- u3037121
- May 2, 2017
- 2 min read
Knowledge Discovery in Databases (KDD):
"Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Given a set of facts (data) F, a language L, and some measure of certainty C, we define a pattern as a statement S in L that describes relationships among a subset Fs of F with a certainty c, such that S is simpler (in some sense) than the enumeration of all facts in Fs. A pattern that is interesting (according to a user-imposed interest measure) and certain enough (again according to the user’s criteria)is called knowledge. The output of a program that monitors the set of facts in a database and produces patterns in this sense is discovered knowledge".
Confused? Me too.
I turned to a broad element of KDD, Data Mining, for an attempt at simplifying what this all meant.
By intaking this media, I found that I echoed the sentiments of Fayyad, Piatetsky-Shapiro and Smyth (1996) in that "KDD refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step in this process. Data mining is the application of specific algorithms for extracting patterns from data".
Most of the battle with Theme 2 (Pattern Recognition) that has persisted throughout the course of this unit is deciphering what explanations of how data is retrieved really mean. I found the paper Advanced Scout: Data Mining and Knowledge Discovery in NBA Data to be of great assistance for me as it contextualised the data with a sport I enjoy. This snippet was one which really resonated with me as it described how something went from "an action" to being "a measured outcome" : "The raw data from NBA games is initially collected using a specialized system designed for logging basketball data. Data include who took a shot, the type of shot, the outcome, any rebounds, etc. Each action is associated with a time code. At the end of each game, the data are uploaded and stored on an electronic bulletin board. Any team can access and retrieve the data of any other team from this billboard. A copy of the data must be downloaded into AS for analysis.
It also provided context for what an elite sporting body might consider as "relevant" when preparing to perform at the highest level.






Comments