Logo eprints

Generalized discrimination discovery on semi-structured data supported by ontology

Luong Thanh, Binh (2011) Generalized discrimination discovery on semi-structured data supported by ontology. Advisor: Turini, Prof. Franco. Coadvisor: Ruggieri, Prof. Salvatore . pp. 195. [IMT PhD Thesis]

[img]
Preview
Text
Luong_Thanh_phdthesis.pdf - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (4MB) | Preview

Abstract

Recently, data mining has been deemed to be an effective means for disclosing evidences and hidden causes of discrimination. If data mining succeeds in finding associations proving the fact that discriminatory treatments has strong relations with sensitive attributes, discrimination is obviously irrefutable. In this thesis, I propose a modified approach of the traditional data mining process to unveil and represent discrimination in a “rich semantic” form for semi-structured business data with multiple-valued treatments based on support from ontology. First, input data are preprocessed to be well-structured with semantic relations, which considerably support discrimination exploration later. The framework then seeks possibly discriminatory relations between the unequal treatments and protected-by-law attributes, e.g., race, religion, sex. These discriminatory relations will be represented in the form of association rules through the notion of matching pairs of itemsets with different sensitive attributes and equal non-sensitive ones that are subject to different treatments. By combining data mining and reasoning service over the ontology, the achieved rules are semantically enriched by object properties between classes (concepts). Thus, they are more valuable and interesting than the flat association rules. In order to address the drawback of local knowledge, the solution of “kNN as Situation Testing” is provided. Besides, a number of measures of discrimination are provided for the purpose of quantifying the level of discrimination to obtain a precise vision of how different sensitive attributes negatively affect the decision and even on each other. Experimental results confirm the potential and flexibility of the approach.

Item Type: IMT PhD Thesis
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
PhD Course: Computer Science and Engineering
Identification Number: 10.6092/imtlucca/e-theses/28
NBN Number: urn:nbn:it:imtlucca-27064
Date Deposited: 10 Jul 2012 13:14
URI: http://e-theses.imtlucca.it/id/eprint/28

Actions (login required, only for staff repository)

View Item View Item