Deployable probabilistic programming

David Tolpin, PUB+

press Space to move through slides

Probabilistic Programming

Programs as statistical models:

1 var breastCancer = flip(0.01)
2 var benignCyst = flip(0.2)
3 var positiveMammogram = (breastCancer && flip(0.8))
4                       || (benignCyst && flip(0.5))
5 condition(positiveMammogram)
6 return breastCancer
				
In the example: breast cancer probability given positive mammogram.

Probabilistic Programs

  • Compute posterior distributions.
  • May contain conditions, loops, recursions.
  • Are (mostly) deterministic!
  • Must be run `backwards'.
  • (Approximate) inference algorithms are required.

Probabilistic programming languages

  • 45 PPLs listed on Wikipedia.
  • 18 PPLs participated in developer meetup at PROBPROG 2018.
  • 8 of the latter are not on Wikipedia (yet).

Probabilistic programming

Two polar views:

  1. Framework for statistical modelling
  2. Inferentiable programming — programs with inferrable parameters

Challenges

  • Simulation vs. inference
  • Data
  • Deployment

Simulation vs. inference


data {
    int N;
    vector[N] y;
    vector[N] x;
}

parameters {
    real alpha; real beta;
    real sigma;
}
model {
    alpha ~ normal(0,10);    
    beta ~ normal(0,10);
    sigma ~ cauchy(0,5);

    for (n in 1:N)
        y[n] ~ normal(
          alpha + beta * x[n],
          sigma);
}

Data structures

I mean:

def update_beliefs(beliefs, i, j, bandwidth):
    beliefs[i][j] += 1
    evidence = sum(beliefs[i])
    if evidence > bandwidth:
        for j in range(len(beliefs[i])):
            beliefs[i][j] *= bandwidth / evidence
      
I write:

def update_beliefs(beliefs, i, j, bandwidth):
    update = torch.zeros(beliefs.shape)
    update[i, j] = 1
    beliefs = beliefs + update
    evidence = beliefs[i, :].sum()
    if evidence > bandwidth:
        scale = torch.ones(beliefs.shape)
        scale[i, :] = bandwidth / evidence
        beliefs = beliefs * scale
    return beliefs
            

Deployment

PyStan (70Mb) Stan, C++ compiler, Cython, OCaml?
Pyro (>600Mb)
Successfully installed contextlib2-0.5.5
decorator-4.3.0 graphviz-0.10.1 networkx-2.2
opt-einsum-2.3.2 pyro-ppl-0.3.0 six-1.12.0
torch-1.0.0 tqdm-4.29.1
Turing.jl (50Mb):
Installed NNlib ─────────── v0.4.3
Installed Colors ────────── v0.9.5
Installed Arpack ────────── v0.3.0
Installed BinaryProvider ── v0.5.3
Installed ForwardDiff ───── v0.10.1
Installed ColorTypes ────── v0.7.5
Installed ProgressMeter ─── v0.9.0
Installed ZipFile ───────── v0.8.0
Installed StaticArrays ──── v0.10.2
Installed Juno ──────────── v0.5.3
... (~40 packages)

Guidelines

  • One language
  • Common data structures
  • Inference code re-used in simulation

Probabilistic programming
in Go

Go is

  • small,
  • expressive,
  • efficient,
  • popular for server-side programming.

'hello world': model


 1  type Model struct {
 2      Data []float64
 3  }
 4  
 5  // x[0] is the mean, x[1] is the 
 6  // log stddev of the distribution
 7  func (m *Model) Observe(x []float64) float64 {
 8    // Our prior is a unit normal ...
 9    ll := Normal.Logps(0, 1, x...)
10    // ... but the posterior is based on data observations.
11    ll += Normal.Logps(x[0], math.Exp(x[1]), m.Data...)
12    return ll
13  }
            

'hello world': data

 1  m := &Model{[]float64{
 2    -0.854, 1.067, -1.220, 0.818, -0.749,
 3    0.805, 1.443, 1.069, 1.426, 0.308}}
            

'hello world': optimization


 1  x := []float64{0, 0}
 2    
 3  opt := &infer.Momentum{
 4    Rate:  0.01,
 5    Decay: 0.998,
 6  }
 7  for iter := 0; iter != 1000; iter++ {
 8    opt.Step(m, x)
 9  }
10  mean, logs = x[0], x[1]
            

'hello world': posterior


 1  x := []float64{0, 0}
 2    
 3  hmc := &infer.HMC{
 4    Eps: 0.1,
 5  }
 6  samples := make(chan []float64)
 7  hmc.Sample(m, x, samples)
 8  for i := 0; i != 1000; i++ {
 9    x = <-samples
10  }
11  hmc.Stop()
            

'hello world': streaming


 1  type Model struct {
 2    Data chan float64  // data is a channel
 3    N    int           // batch size
 4  }
 5  
 6  func (m *Model) Observe(x []float64) float64 {
 7    ll := Normal.Logps(0, 1, x...)
 8    // observe a batch of data from the channel
 9    for i := 0; i != m.N; i++ {
10      ll += Normal.Logp(x[0], math.Exp(x[1]), <- m.Data)
11    }
12    return ll
13  }
            

Why Go?

  • Comes with parser and type checker.
  • Compiles and runs fast.
  • Allows efficient parallel execution, via goroutines.

Infergo

  • Models are written in Go.
  • Relies on automatic differentiation for inference.
  • Works anywhere where Go does.
  • No external dependencies.
  • Licensed under the MIT license.

Model

Model interface:

1 type Model interface {
2   Observe(parameters []float64) float64
3 }
            
A model (exponential distribution):

1 type Expon float64 
2 
3 func (m Expon) Observe(x []float64) float64 {
4   return -x*m
5 }
            

Distributions

A distribution is a model:

 1  var Normal normal
 2
 3  // Observe implements the Model interface.
 4  func (dist normal) Observe(x []float64) float64 {
 5    mu, sigma, y := x[0], x[1], x[2:]
 6    return dist.Logps(mu, sigma, y...)
 7  }
            

Distributions (cont.)


 8  // Logp computes the log pdf of a single observation.
 9  func (_ normal) Logp(mu, sigma float64, y float64)
10      float64 {
11    ...
12  }
13
14  // Logps computes the log pdf of a vector of observations.
15  func (_ normal) Logps(mu, sigma float64, y ...float64)
16      float64 {
17    ...
18  }
        

Differentiation

  • Model methods returning float64 or nothing are differentiated.
  • Within the methods, the following is differentiated:
    • assignments to float64;
    • returns of float64;
    • standalone calls to differentiated model methods.

Differentiation (cont.)

Reverse-mode via source code transformation:

 1  func (m *Model) Observe(x []float64) float64 {
 2    var ll float64
 3    ad.Assignment(&ll, ad.Call(func(_ []float64) {
 4      Normal.Logps(0, 0, x...)
 5    }, 2, ad.Value(0), ad.Value(1)))
 6    ad.Assignment(&ll, 
 7      ad.Arithmetic(ad.OpAdd, &ll,
 8        ad.Call(func(_ []float64) {
 9          Normal.Logps(0, 0, m.Data...)
10        }, 2, &x[0], ad.Elemental(math.Exp, &x[1]))))
11    return ad.Return(&ll)
12  }
            

Inference

  • Optimization via gradient ascent (Momentum, Adam), works with streamed and stochastic data.
  • Full posterior inference via HMC variants. Samples are produced concurrently and passed through a channel.
  • Third-party inference algorithms.

Third-party inference

  • Straightforward because Infergo uses built-in Go float64.
  • Gonum (http://gonum.org/) library for numerical computation:
    
    func FuncGrad(m Model) (
      Func func(x []float64) float64,
      Grad func(grad []float64, x []float64))
                    

Examples

  • Simple models are still simple.
  • Complex models are less complex.

8 schools

StanInfergo

data {
    int<lower=0> J;
    vector[J] y;
    vector<lower=0>[J] sigma;
}

parameters {
    real mu;
    real<lower=0> tau;
    vector[J] eta;
}

transformed parameters {
    vector[J] theta;
    theta = mu + tau * eta;
}

model {
    eta ~ normal(0, 1);
    y ~ normal(theta, sigma);
}

            

type Model struct {
  J          int
  Y          []float64
  Sigma      []float64
}

func (m *Model) Observe(x []float64) float64 {
  mu := x[0]
  tau := math.Exp(x[1])
  eta := x[2:]

  ll := Normal.Logp(0, 1, eta)
  for i, y := range m.Y {
    theta := mu + tau*eta[i]
    ll += Normal.Logp(theta, m.Sigma[i], y)
  }
  return ll
}
            

Linear regression


type Model struct {
  Data  [][]float64
}

func (m *Model) Observe(x []float64)
    float64 {
  ll := 0.

  alpha, beta := x[0], x[1]
  sigma := math.Exp(x[2])

  for i := range m.Data {
    ll += Normal.Logp(
      m.Simulate(m.Data[i][0], alpha, beta),
      sigma, m.Data[i][1])
  }
...
            

...
  return ll
}

// Simulate predicts y for x based on
// inferred parameters.
func (m *Model) Simulate(x, alpha, beta float64)
    float64 {
  y := alpha +beta*x
  return y
}
            

Latent Dirichlet allocation

Infergo (0.5s|8.9s)Stan (54s|3.7s)

type LDAModel struct {
  K     int       // num topics
  V     int       // num words
  M     int       // num docs
  N     int       // total word instances
  Word  []int     // word n
  Doc   []int     // doc for word n
  Alpha []float64 // topic prior
  Beta  []float64 // word prior
}

func (m *LDAModel) Observe(x []float64) float64 {
  ll := Normal.Logps(0, 1, x...)
  theta := make([][]float64, m.M)
  D.Simplices(&x, m.K, theta)
  phi := make([][]float64, m.K)
  D.Simplices(&x, m.V, phi)
 
  // priors
  ll += Dirichlet{m.K}.Logps(m.Alpha, theta...)
  ll += Dirichlet{m.V}.Logps(m.Beta, phi...)
  
  gamma := make([]float64, m.K)
  for in := 0; in != m.N; in++ {
    for ik := 0; ik != m.K; ik++ {
      gamma[ik] = math.Log(theta[m.Doc[in]-1][ik]) +
        math.Log(phi[ik][m.Word[in]-1])
    }
    ll += D.LogSumExp(gamma)
  }
  return ll
}

            

data {
  int K;               // num topics
  int V;               // num words
  int M;               // num docs
  int N;           // total word instances
  int w[N];    // word n
  int doc[N];  // doc for word n
  vector[K] alpha;     // topic prior
  vector[V] beta;      // word prior
}


parameters {
  simplex[K] theta[M]; // topic dist for doc m
  simplex[V] phi[K];   // word dist for topic k
}


model {
  for (m in 1:M)  
    theta[m] ~ dirichlet(alpha);  // prior
  for (k in 1:K)  
    phi[k] ~ dirichlet(beta);     // prior

  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K) 
      gamma[k] <- log(theta[doc[n],k]) 
        + log(phi[k,w[n]]);
    increment_log_prob(log_sum_exp(gamma));
  }
}
            

Acknowledgements

  • Frank Wood introduced me to probabilistic programming.
  • Jan-Willem van de Meent discussed with me motives, ideas, and implementation choices behind Infergo.
  • PUB+ supported me in development of Infergo.

Thank you!

Gehenna Gopher