Logo eprints

Graph and network data: mining the temporal dimension

Berlingerio, Michele (2009) Graph and network data: mining the temporal dimension. Advisor: Giannotti, Dr. Fosca. Coadvisor: Bonchi, Dr. Francesco . pp. 157. [IMT PhD Thesis]

[img]
Preview
Text
Berlingerio_phdthesis.pdf - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (2MB) | Preview

Abstract

In the last years, there have been many studies on analyzing network and graph data. A wide range of problems, such as studying the global and local properties of a graph, finding interesting structures, modeling particular characteristics, assessing the properties of some particular networks such as the Web or a co-authorship networks, have increased the attention of the scientific community, involved in finding efficient and powerful techniques to enable the achievement of the desired results. For example, with the aim of finding interesting and frequent substructures in graphs, algorithms such as AGM, FSG, gSpan, Gaston, FFSMY, ADI-Mine, HSIGRAM and VSIGRAM have been presented for improving scalability on mining subgraphs one after one. However, only in the last few years the attention has moved to a particular aspect of graphs and networks: the temporal dimension. Thanks also to the larger availability of online social network services, the amount of data that allows for the analysis of the dynamics of complex networks has increased very fast in the last five years. This kind of data contains rich information about what happens to a network during time, and enables the analysts to model and discover interesting properties related to the temporal dimension, which are both meaningless and impossible in the static setting. The temporal dimension can play a double role for a network. First, the underlying structure, namely the graph, can evolve over time, showing new users joining a community, new connections created among users, change of properties of a particular group of people, and so on. Second, given an established network, users may perform actions during time, leading to flows of information circulating among the connections, sequences of tasks performed by a sequence of users, spread of influence among the network, and so on. Despite the clear richness of the above setting, the current graph mining techniques are somehow too generic, and they do not explicitly take into consideration the time during their stages. In order to overcome to this problem, in this thesis we study the current graph mining algorithms, we study the possibility of pushing constraints during the computation that would allow us to efficiently analyze the temporal dimension at mining stage, and we develop new techniques that can help in this kind of analysis. In order to prove the effectiveness of our approach, we apply a pre-existent graph miner, a modified version of it specialized to deal with the temporal dimension, and another pre-existent tool of analysis, namely the Temporally Annotated Sequences framework, to real data, to show how we can deal with the above setting, with particular focus on problems such as mining the information propagation in a network, mining graph evolution rules, and mining the temporal dimension of process logs to derive the actual workflow diagram in a process. Our results justify the need for this approach, and show that specialized techniques help in modeling and analyzing temporal graph and network data.

Item Type: IMT PhD Thesis
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
PhD Course: Computer Science and Engineering
Identification Number: 10.6092/imtlucca/e-theses/21
Date Deposited: 06 Jul 2012 09:35
URI: http://e-theses.imtlucca.it/id/eprint/21

Actions (login required, only for staff repository)

View Item View Item