Deployable probabilistic programming

David Tolpin, PUB+

press Space to move through slides

Probabilistic programming languages

  • 45 PPLs listed on Wikipedia.
  • 18 PPLs participated in developer meetup at PROBPROG 2018.
  • 8 of the latter are not on Wikipedia (yet).

Probabilistic programming

Two polar views:

  1. Framework for statistical modelling
  2. Inferentiable programming programs with inferrable parameters

Challenges

  • Simulation vs. inference
  • Data
  • Deployment

Simulation vs. inference


data {
    int N;
    vector[N] y;
    vector[N] x;
}

parameters {
    real alpha; real beta;
    real sigma;
}
model {
    alpha ~ normal(0,10);    
    beta ~ normal(0,10);
    sigma ~ cauchy(0,5);

    for (n in 1:N)
        y[n] ~ normal(
          alpha + beta * x[n],
          sigma);
}

Data structures

I mean:

def update_beliefs(beliefs, i, j, bandwidth):
    beliefs[i][j] += 1
    evidence = sum(beliefs[i])
    if evidence > bandwidth:
        for j in range(len(beliefs[i])):
            beliefs[i][j] *= bandwidth / evidence
      
I write:

def update_beliefs(beliefs, i, j, bandwidth):
    update = torch.zeros(beliefs.shape)
    update[i, j] = 1
    beliefs = beliefs + update
    evidence = beliefs[i, :].sum()
    if evidence > bandwidth:
        scale = torch.ones(beliefs.shape)
        scale[i, :] = bandwidth / evidence
        beliefs = beliefs * scale
    return beliefs
            

Deployment

PyStan (70Mb) Stan, C++ compiler, Cython, OCaml?
Pyro (>600Mb)
Successfully installed contextlib2-0.5.5
decorator-4.3.0 graphviz-0.10.1 networkx-2.2
opt-einsum-2.3.2 pyro-ppl-0.3.0 six-1.12.0
torch-1.0.0 tqdm-4.29.1
Turing.jl (50Mb):
Installed NNlib ─────────── v0.4.3
Installed Colors ────────── v0.9.5
Installed Arpack ────────── v0.3.0
Installed BinaryProvider ── v0.5.3
Installed ForwardDiff ───── v0.10.1
Installed ColorTypes ────── v0.7.5
Installed ProgressMeter ─── v0.9.0
Installed ZipFile ───────── v0.8.0
Installed StaticArrays ──── v0.10.2
Installed Juno ──────────── v0.5.3
... (~40 packages)

Guidelines

  • One language
  • Common data structures
  • Inference code re-used in simulation

Probabilistic programming
in Go

Go is

  • small,
  • expressive,
  • efficient,
  • popular for server-side programming.

'hello world': model


 1  type Model struct {
 2      Data []float64
 3  }
 4  
 5  // x[0] is the mean, x[1] is the 
 6  // log stddev of the distribution
 7  func (m *Model) Observe(x []float64) float64 {
 8    // Our prior is a unit normal ...
 9    ll := Normal.Logps(0, 1, x...)
10    // ... but the posterior is based on data observations.
11    ll += Normal.Logps(x[0], math.Exp(x[1]), m.Data...)
12    return ll
13  }
            

'hello world': data

 1  m := &Model{[]float64{
 2    -0.854, 1.067, -1.220, 0.818, -0.749,
 3    0.805, 1.443, 1.069, 1.426, 0.308}}
            

'hello world': optimization


 1  x := []float64{0, 0}
 2    
 3  opt := &infer.Momentum{
 4    Rate:  0.01,
 5    Decay: 0.998,
 6  }
 7  for iter := 0; iter != 1000; iter++ {
 8    opt.Step(m, x)
 9  }
10  mean, logs = x[0], x[1]
            

'hello world': posterior


 1  x := []float64{0, 0}
 2    
 3  hmc := &infer.HMC{
 4    Eps: 0.1,
 5  }
 6  samples := make(chan []float64)
 7  hmc.Sample(m, x, samples)
 8  for i := 0; i != 1000; i++ {
 9    x = <-samples
10  }
11  hmc.Stop()
            

'hello world': streaming


 1  type Model struct {
 2    Data chan float64  // data is a channel
 3    N    int           // batch size
 4  }
 5  
 6  func (m *Model) Observe(x []float64) float64 {
 7    ll := Normal.Logps(0, 1, x...)
 8    // observe a batch of data from the channel
 9    for i := 0; i != m.N; i++ {
10      ll += Normal.Logp(x[0], math.Exp(x[1]), <- m.Data)
11    }
12    return ll
13  }
            

Why Go?

  • Comes with parser and type checker.
  • Compiles and runs fast.
  • Allows efficient parallel execution, via goroutines.

Infergo

  • Models are written in Go.
  • Relies on automatic differentiation for inference.
  • Works anywhere where Go does.
  • No external dependencies.
  • Licensed under the MIT license.

Model

Model interface:

1 type Model interface {
2   Observe(parameters []float64) float64
3 }
            
A model (exponential distribution):

1 type Expon float64 
2 
3 func (m Expon) Observe(x []float64) float64 {
4   return -x*m
5 }
            

Distributions

A distribution is a model:

 1  var Normal normal
 2
 3  // Observe implements the Model interface.
 4  func (dist normal) Observe(x []float64) float64 {
 5    mu, sigma, y := x[0], x[1], x[2:]
 6    return dist.Logps(mu, sigma, y...)
 7  }
            

Distributions (cont.)


 8  // Logp computes the log pdf of a single observation.
 9  func (_ normal) Logp(mu, sigma float64, y float64)
10      float64 {
11    ...
12  }
13
14  // Logps computes the log pdf of a vector of observations.
15  func (_ normal) Logps(mu, sigma float64, y ...float64)
16      float64 {
17    ...
18  }
        

Differentiation

  • Model methods returning float64 or nothing are differentiated.
  • Within the methods, the following is differentiated:
    • assignments to float64;
    • returns of float64;
    • standalone calls to differentiated model methods.

Differentiation (cont.)

Reverse-mode via source code transformation:

 1  func (m *Model) Observe(x []float64) float64 {
 2    var ll float64
 3    ad.Assignment(&ll, ad.Call(func(_ []float64) {
 4      Normal.Logps(0, 0, x...)
 5    }, 2, ad.Value(0), ad.Value(1)))
 6    ad.Assignment(&ll, 
 7      ad.Arithmetic(ad.OpAdd, &ll,
 8        ad.Call(func(_ []float64) {
 9          Normal.Logps(0, 0, m.Data...)
10        }, 2, &x[0], ad.Elemental(math.Exp, &x[1]))))
11    return ad.Return(&ll)
12  }
            

Inference

  • Optimization via gradient ascent (Momentum, Adam), works with streamed and stochastic data.
  • Full posterior inference via HMC variants. Samples are produced concurrently and passed through a channel.
  • Third-party inference algorithms.

Third-party inference

  • Straightforward because Infergo uses built-in Go float64.
  • Gonum (http://gonum.org/) library for numerical computation:
    
    func FuncGrad(m Model) (
      Func func(x []float64) float64,
      Grad func(grad []float64, x []float64))
                    

Examples

  • Simple models are still simple.
  • Complex models are less complex.

8 schools

StanInfergo

data {
    int<lower=0> J;
    vector[J] y;
    vector<lower=0>[J] sigma;
}

parameters {
    real mu;
    real<lower=0> tau;
    vector[J] eta;
}

transformed parameters {
    vector[J] theta;
    theta = mu + tau * eta;
}

model {
    eta ~ normal(0, 1);
    y ~ normal(theta, sigma);
}

            

type Model struct {
  J          int
  Y          []float64
  Sigma      []float64
}

func (m *Model) Observe(x []float64) float64 {
  mu := x[0]
  tau := math.Exp(x[1])
  eta := x[2:]

  ll := Normal.Logp(0, 1, eta)
  for i, y := range m.Y {
    theta := mu + tau*eta[i]
    ll += Normal.Logp(theta, m.Sigma[i], y)
  }
  return ll
}
            

Linear regression


type Model struct {
  Data  [][]float64
}

func (m *Model) Observe(x []float64)
    float64 {
  ll := 0.

  alpha, beta := x[0], x[1]
  sigma := math.Exp(x[2])

  for i := range m.Data {
    ll += Normal.Logp(
      m.Simulate(m.Data[i][0], alpha, beta),
      sigma, m.Data[i][1])
  }
...
            

...
  return ll
}

// Simulate predicts y for x based on
// inferred parameters.
func (m *Model) Simulate(x, alpha, beta float64)
    float64 {
  y := alpha +beta*x
  return y
}
            

Latent Dirichlet allocation

Infergo (0.5s|8.9s)Stan (54s|3.7s)

type LDAModel struct {
  K     int       // num topics
  V     int       // num words
  M     int       // num docs
  N     int       // total word instances
  Word  []int     // word n
  Doc   []int     // doc for word n
  Alpha []float64 // topic prior
  Beta  []float64 // word prior
}

func (m *LDAModel) Observe(x []float64) float64 {
  ll := Normal.Logps(0, 1, x...)
  theta := make([][]float64, m.M)
  D.Simplices(&x, m.K, theta)
  phi := make([][]float64, m.K)
  D.Simplices(&x, m.V, phi)
 
  // priors
  ll += Dirichlet{m.K}.Logps(m.Alpha, theta...)
  ll += Dirichlet{m.V}.Logps(m.Beta, phi...)
  
  gamma := make([]float64, m.K)
  for in := 0; in != m.N; in++ {
    for ik := 0; ik != m.K; ik++ {
      gamma[ik] = math.Log(theta[m.Doc[in]-1][ik]) +
        math.Log(phi[ik][m.Word[in]-1])
    }
    ll += D.LogSumExp(gamma)
  }
  return ll
}

            

data {
  int K;               // num topics
  int V;               // num words
  int M;               // num docs
  int N;           // total word instances
  int w[N];    // word n
  int doc[N];  // doc for word n
  vector[K] alpha;     // topic prior
  vector[V] beta;      // word prior
}


parameters {
  simplex[K] theta[M]; // topic dist for doc m
  simplex[V] phi[K];   // word dist for topic k
}


model {
  for (m in 1:M)  
    theta[m] ~ dirichlet(alpha);  // prior
  for (k in 1:K)  
    phi[k] ~ dirichlet(beta);     // prior

  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K) 
      gamma[k] <- log(theta[doc[n],k]) 
        + log(phi[k,w[n]]);
    increment_log_prob(log_sum_exp(gamma));
  }
}
            

Real story: traffic acquisition

  • Publisher (P) acquires traffic from a social platform (F).
  • P pays F per conversion.
  • P shares with F conversion events.
  • F adjusts traffic based on conversion events.
  • How does P choose the bid?

Generative model

$$ \theta_{visitor} \sim Visitors$$ $$ cost \sim Cost(\theta_{visitor})$$ $$ \theta_{value} \sim Value(\theta_{visitor})$$

$$ \frac {cost} {\Pr\left(v \ge threshold|v \sim Value(\theta_{value})\right)} < maxBid$$

Simulation


 1 func (m *Model) Visit(
 2     vins <-chan *VIn,  
 3     vouts chan<- *VOut,    
 4     conv, maxBid float64) {
 5   // Sample
 6   vin := <-vins
 7   // Bid
 8   minBid := m.MinBid(vin, conv)
 9   if maxBid >= minBid { // Won
10     vout := new(VOut)
11     vout.Value = vin.Mean +
12       vin.Std*rand.NormFloat64()
13     vout.Cost = minBid
14     vouts <- vout
15   }
16 }
            

17 func (m *Model) MinBid(vin *VIn, conv float64)
18     float64 {
19   p_conv := 0.5*math.Erfc((conv-vin.Mean)/
20                           (vin.Std*math.Sqrt2))
21   return vin.Cost / p_conv
22 }
            

Inference


 1 func (m *Model) Observe(x []float64) (ll float64) {
 2   // Prior
 3   Normal.Logps(0, 10, x...)
 4 
 5   // Destructure
 6   lmu := x[0]
 7   lsigma := math.Exp(x[1])
 8   sigma := math.Exp(x[2])
 9   cost := math.Exp(x[3])
10 
11   vins := make(chan *VIn, 2)
12   vouts := make(chan *VOut, 2)
13 
14   for _, obs := range m.Obs {
15     visits, spent, revenue := 0.1, 0., 0.
16     for i := 0; i != m.Samples; i++ {
17       // Simulate a potential visitor
18       vin := new(VIn)
19       vin.Mean = math.Exp(lmu +
20                           lsigma*rand.NormFloat64())
21       vin.Std = sigma
22       vin.Cost = cost
23       vins <- vin
            

24       // Filter
25       m.Visit(vins, vouts, obs.Conv, obs.MaxBid)
26       select {
27       case vout := <-vouts:
28         // Won
29         visits++
30         spent += vout.Cost
31         revenue += vout.Value
32       default:
33       }
34     }
35 
36     // Condition on observations
37     dvalue := revenue/visits - obs.Revenue/obs.Visits
38     dcost := spent/visits - obs.Spent/obs.Visits
40     ll -= dvalue*dvalue + dcost*dcost
41   }
42   return ll / float64(len(m.Obs)*m.Samples)
43 }
            
$ cat -n observations.csv
     1  conv,maxbid,visits,revenue,spent
     2  0.1,0.15,476,158.65724793466998,61.591053850255186
     3  0.1,0.2,951,236.37374155139224,143.5770446642684
     4  0.1,0.3,1000,251.3839582603887,152.93037377837516
 ...
    25  0.45,0.3,234,98.91680448845665,52.30043312819199
    26  0.45,0.45,526,180.3501718415401,161.0863058071484
    27  0.45,0.7,847,245.76371979183693,349.3654385850295
$
              
$  ./infer -samples 100 -niter 1000 -rate 0.1 \
     < observations.csv
MLE (after 10000 iterations):
  Log-likelihood: -0.001436 => -7.753e-05
  lmean: -1.459
  lstd: 0.2952
  std: 0.09415
  cost: 0.1087
$

Truth: lmean=-1.5, lstd=0.3, std=0.1, cost=0.1

Acknowledgements

  • Frank Wood introduced me to probabilistic programming.
  • Jan-Willem van de Meent discussed with me motives, ideas, and implementation choices behind Infergo.
  • PUB+ supported me in development of Infergo.

Thank you!

Gehenna Gopher