Community Detection via Random and Adaptive Sampling
Recorded 01 December 2014 in Lausanne, Vaud, Switzerland
Event: IC Colloquia - EPFL IC School Colloquia
Extracting structures or communities in networks is a central task in many disciplines including social sciences, biology, computer science, statistics, and physics. Applications are numerous. For instance, in social networks, one hopes that identifying clusters of users provides fundamental insights into the way users behave and interact, and in turn, helps the design of efficient recommender systems or the development of other marketing techniques. This talk surveys recent advances in community detection (or in graph partitioning) from sampled data. To extract the communities, the interaction between pairs of nodes may be sampled from a large available data set, allowing a given node pair to be sampled several times. Naturally, nodes within the same community are more likely to interact. We first present fundamental performance limits satisfied by any clustering algorithm using both non-adaptive and adaptive sampling strategies. We also devise simple spectral algorithms that achieve these limits. Finally, we investigate scenarios where the network size is so large that the matrix representing the sampled node interactions is hard to store and manipulate. This setting is well captured by the data stream model, in which interaction samples are revealed in small batches sequentially and then erased after having been processed. For this model, we provide algorithms that require a memory linearly or even sub-linearly increasing with the network size, and that nonetheless approach the fundamental performance limits identified for any memory-unconstrained algorithm.
Watched 5478 times.Watch