Networks are increasingly important in many aspects of our world: physical networks like transportation networks, utility networks and the Internet, online information networks like the WWW, online social networks like Facebook and Twitter, epidemiological networks for global disease transmission, genomic and protein networks in computational biology, and many more. How do we model and learn these networks? In contrast to conventional learning problems, where we have many independent samples, it is often the case for these networks that we can get only one independent sample. How do we use a single snapshot today to learn a model for the network, and therefore be able to predict a similar, but larger network in the future? In the case of relatively small or moderately sized networks, it’s appropriate to model the network parametrically, and attempt to learn these parameters. For massive networks, a non-parametric representation is more appropriate. Here I show how to use the theory of graph limits, developed over the last decade, to give consistent estimators for machine learning of massive sparse networks, and moreover how to do this in a way that protects the privacy of individuals on the network.
Jennifer Tour Chayes is Distinguished Scientist and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in 2008, and Microsoft Research New York City, which she co-founded in 2012. These two laboratories are widely renowned interdisciplinary centers, bringing together computer scientists, mathematicians, physicists, social scientists, and biologists, and helping to lay the foundations of data science.
Columbia University makes every effort to accommodate individuals with disabilities. If you require disability accommodations to attend an event at Columbia University, please contact Disability Services at 212-854-2388 at least 10 days in advance of the event.