Ontology-driven knowledge discovery

Furletti, Barbara (2009) Ontology-driven knowledge discovery. Advisor: Turini, Prof. Franco. pp. 183. [IMT PhD Thesis]

 Preview
Text
Furletti_phdthesis.pdf - Published Version
The problem of Knowledge Discovery has always attracted many researchers and continues to be of great relevance to the computer science community in the branch of learning. This thesis aims to contribute to this topic, getting hints from the Ontology and Data Mining environments. We investigate a method for extracting new implicit knowledge directly from an ontology by using an inductive/deductive approach. By giving a sort of Bayesian interpretation to relationships that already exist in an ontology, we are able to return the extracted knowledge in form of Influence Rules. The idea is to split the extraction process in two separate phases by exploiting the ontology peculiarity of keeping metadata (the schema) and data (the instances) separate. The deductive process draws inference from the ontology structure, both concepts and properties, by applying link analysis techniques and producing a sort of implications (rules schemas) in which only the most important concepts are involved. Then an inductive process, realized by a data mining algorithm, explores the ontology instances for enriching the implications and building the final rules. A final rule has a form like $$<premise \rightarrow^w consequence>$$ where premise and consequence refer to the class names, and values to their datatype properties, while w, the weight, measures the strength of the influence. An example of a final rule is: $$Manager.hasAge < 45 \rightarrow^{0.80} Project.hasDegreeOfSuccess = good$$. This can be read as, in 80% of the cases, whenever a manager of a company is less then 45 years old, then the project he manages has a good degree of success. feasibility of the project, is that the approach allows us to extract “higher level” rules w.r.t. classical knowledge discovery techniques. In fact, ontology metadata gives a general view of the domain of interest and supplies information about all the elements apart from the fact that they are included as instances in the collected data. The technique is completely general and applicable to each domain. Since the output is a set of “standard” Influence Rules, it can be used to integrate existing knowledge or for supporting any other data mining process. The thesis includes the following chapters: Chapter 1 contains a brief introduction of the work, focusing on the main questions that have to be addressed. Chapter 2 offers an overview of the context of research in which the thesis is part of: data mining and ontologies. Chapter 3 explores the literature dealing with the open questions raised in chapter 1. Chapter 4 is the core section; it discusses the proposed solutions and presents all the phases of the extraction process as well as the algorithms and the proofs. Chapter 5 describes an application of the methodology in the context of MUSING, a European project in the field of business intelligence. Chapter 6 ends this thesis with final considerations and future possible works.