Logo eprints

Entropy-based methods to tackle missing information in complex networks

Parisi, Federica (2019) Entropy-based methods to tackle missing information in complex networks. Advisor: Caldarelli, Prof. Guido. Coadvisor: Squartini, Prof. Tiziano . pp. 111. [IMT PhD Thesis]

[img] Text (Doctoral thesis)
Parisi_phdthesis.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (3MB)


The work presented in this thesis focuses on an issue that very commonly arise when studying a network: missing information. There are many phenomena that can cause such a lack of knowledge, but prior to any attempt at studying the data, it is desirable to have a knowledge of the network at hand that is as complete as possible. Here I will address specifically two types of missing information problems, namely network reconstruction and link prediction. In the former case, the network structure is hidden, the only information we have access to is the size of the network and some aggregate nodespecific quantity. In the context of link prediction we face a different issue: there is a real underlying network that represents the phenomenon we want to study, of which we can only observe an incomplete version where some links are not present. Our goal will be to identify the most likely candidates to be the missing links and, for weighted networks, their intensity. Both problem will be tackled using entropybased methods, that guarantee the results to be unbiased. The thesis presents advancements on three major fronts. It generalizes the formalism for network reconstruction, proposing a flexible methodology that allows to include any prior topological knowledge and to derive a compatible, unbiased weighted distribution. It proposes a new approach to link prediction, whose key idea is to tune reconstruction models on the accessible portion of network to infer the partiallyobserved portion, i.e. the most likely missing links. Finally, in the case of weighted prediction, unlike the vast majority of alternative methods, it provides an explicit recipe to estimate the links weights, together with their confidence intervals.

Item Type: IMT PhD Thesis
Subjects: H Social Sciences > HB Economic Theory
PhD Course: Economics management and data science
Identification Number: 10.6092/imtlucca/e-theses/269
NBN Number: urn:nbn:it:imtlucca-27295
Date Deposited: 29 Jul 2019 12:09
URI: http://e-theses.imtlucca.it/id/eprint/269

Actions (login required, only for staff repository)

View Item View Item