Large stephan

Probabilistic representation learning and scalable Bayesian inference

Stephan Mandt

Recorded 12 October 2017 in Lausanne, Vaud, Switzerland

Event: IC Colloquia - EPFL IC School Colloquia


Probabilistic modeling is a powerful paradigm which has seen dramatic innovations in recent years. These innovations in approximate inference, mainly due to automatic differentiation and stochastic optimization, have made probabilistic modeling scalable and broadly applicable to many complex model classes. I start my talk by reviewing the dynamic skip-gram model (ICML 2017) as an example of this class. The model results from combining a probabilistic interpretation of word embeddings with latent diffusion priors, and allows us to study the dynamics of word embeddings for text data that are associated with different time stamps. Our Bayesian approach allows us to share information across the time domain, and is robust even when the data at individual points in time is small. As a result, we can automatically detect words that change their meanings even in moderately-sized corpora. Yet, the model is Bayesian non-conjugate, and therefore we have to draw on modern variational inference methods to train it efficiently on large data. The second part of my talk is therefore devoted to advances in variational inference. Here, I will review our very recent perturbative black box variational inference algorithm (NIPS 2017), that uses variational perturbation theory of statistical physics to construct corrections to the standard variational lower bound. Last, I will demonstrate that simple stochastic gradient descent with a constant step size is a form of approximate Bayesian inference (JMLR and ICML 2016).

Watched 666 times.