for Probabilistic Programs

- • A program with random computations.
- • Distributions are conditioned by `observations'.
- • Values of certain expressions are `predicted' —
**the output**.

```
(let [;; Model
dist (sample (categorical [[normal 1/2] [gamma 1/2]]))
a (sample (gamma 1 1))
b (sample (gamma 1 1))
d (dist a b)]
;; Observations
(observe d 1) (observe d 2)
(observe d 4) (observe d 7)
;; Explanation
(predict :d (type d))
(predict :a a) (predict :b b)))
```

- • Suggest
**most probable explanation**(MPE) - most likely assignment
for all non-evidence variable given evidence.
- • Approximately
**compute integral**of the
form $\Phi=\int_{-\infty}^{\infty} \varphi(x)p(x) dx$
- • Continuously and
**infinitely generate a sequence of samples**. ✓

- $\mathcal{P}$ — probabilistic program.
- $\pmb{x}$ — random variables.
- $\pmb{z}$ — output.

Run $\mathcal{P}$ once, remember $\pmb{x}, \pmb{z}$.

Can we do better?

- Christophe Andrieu and Johannes Thoms. A tutorial on adaptive MCMC. Statistics and Computing, 18(4):343–373, 2008.
- B. Paige, F. Wood, A. Doucet, and Y.W. Teh. Asynchronous anytime sequential Monte Carlo. In NIPS-2014, to appear.
- David Wingate, Andreas Stuhlmu ̈ller, and Noah D. Goodman. Lightweight implementations of probabilistic programming languages via transformational compilation. In Proc. of AISTATS-2011.
- Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. A new approach to probabilistic programming inference. In AISTATS-2014.

1000 samples

10,000 samples

100,000 samples

$\begin{align*} \vec{x}_t &\sim {\rm Norm}(\vec{A} \cdot \vec{x}_{t-1}, \vec{Q}) , & \vec{y}_t &\sim {\rm Norm}(\vec{C} \cdot \vec{x}_{t}, \vec{R}) . \end{align*}$

$\begin{align*} \vec{A} &= \left[ \begin{array}{cc} ~\cos \omega~ & ~-\sin \omega~ \\ ~\sin \omega~ & ~\cos \omega~ \end{array} \right] , & \vec{Q} &= \left[ \begin{array}{cc} ~q~ & 0 \\ 0 & ~q~ \end{array} \right] . \end{align*}$

100 16-dimensional observations,

500 samples after 10,000 samples of burn-in.

- • Selects each $x_i$ with a different probability.
- • Maintains vector of weights $\pmb{W}$ of random choices:

Initialize $\pmb{W}^0$ to a constant.
Run $\mathcal{P}$ once.
*program output.*

- • Objective: faster convergence of
*program output*$\pmb{z}$. - • Adaptation parameter: probabilities of selecting random choices for modification.
- • Optimization target: maximize the
**change**in the program output: $$R^t = \frac 1 {|\pmb{z}^t|}\sum_{k=1}^{|\pmb{z}^t|} \pmb{1}(z_k^t \ne z_k^{t-1}).$$

$W_i$ reflects the anticipated change in $\pmb{z}$ from modifying $x_i$.

Modifying `x2` affects the output ...

```
(let [x1 (sample (normal 1 10))
x2 (sample (normal x1 1))]
(observe (normal x2 1) 2)
(predict x1))
```

... but only when `x1` is also modified.

- • For each $x_i$, reward $r_i$ and count $c_i$ are kept.
- • A
*history*of modified random choices is attached to every $z_j$.

When modification of $x_k$ accepted:

Append $x_k$ to the history.

**Convergence:**

For *any partitioning* of $\pmb{x}$, Adaptive LMH selects variables from each partition with **non-zero probability.**

- • A scheme of rewarding random samples based on
*program output.* - • An approach to propagation of sample rewards to MH proposal
*scheduling*parameters. - • An application of this approach to LMH, where the probabilities of
*selecting each variable*for modification are adjusted.

- • Extending the adaptation approach to other
**sampling methods**. - • Reward scheme that takes into account the
**amount of difference**between samples. - • Acquisition of
**dependencies**between predicted expressions and random variables.