University of Oxford
Oxford, UK

# Path Finding under Uncertainty through Probabilistic Inference

## Preliminaries

### Probabilistic Program

• A program with random computations.
• Distributions are conditioned by observations'.
• Values of certain expressions are predicted'.
(let [;; Model
dist (sample (categorical
[[normal 1/2]
[gamma 1/2]]))
a (sample (gamma 1 1))
b (sample (gamma 1 1))
d (dist a b)]
;; Observations
(observe d 1) (observe d 2)
(observe d 4) (observe d 7)
;; Explanation
(predict :d (type d))
(predict :a a) (predict :b b)))

### Inference Objectives

• Continuously and infinitely generate a sequence of samples.
• Approximately compute integral of the form $\Phi=\int_{-\infty}^{\infty} \varphi(x)p(x) dx$
• Suggest most probable explanation (MPE) - most likely assignment to random variables. ✓

CTP is a problem finding the shortest travel distance in a graph where some edges may be blocked.

Given

• Undirected weighted graph $G=(V,E)$.
• The initial and the final location nodes $s$ and $t$.
• Edge weights $w: E\rightarrow\mathcal{R}$.
• Traversability probabilities: $p_o: E\rightarrow(0,1]$.
find the shortest travel distance from $s$ to $t$ — the sum of weights of all traversed edges.

## References

1. Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. A new approach to probabilistic programming inference. In AISTATS-2014.
2. Bar-Noy, A., and Schieber, B. 1991. The Canadian trav- eller problem. In Proc. of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’91, 261–270.
3. Eyerich, P.; Keller, T.; and Helmert, M. 2010. High-quality policies for the Canadian traveler’s problem. In Proc. of AAAI-2010.
4. Shimony, S. E., and Charniak, E. 1991. A new algorithm for finding MAP assignments to belief networks. In Proc. of UAI ’90, 185–196.
5. Kappen H.J., Gómez V., Opper M. Optimal control as a graphical model inference problem. Machine Learning, vol. 87, no. 2, pp. 159-182, 2012

## Case Study: Canadian Traveller Problem

### Policy

Depth-first search based policy:

• the agent traverses $G$ in depth-first order.
• the policy specifies the probabilities of selecting each adjacent edge in every node.

### Probabilistic Program

require CTP$(G,s,t,w,p)$ for $v \in V$ $policy(v)$ $\gets$ Draw(Dirichlet($\pmb{1}^{\deg(v)}$)) end for repeat $instance$ $\gets$ Draw(CTP$(G,w,p)$) ($reached$, $distance$) $\gets$ StDFS($instance$, $policy$) until $reached$ Observe(1, Bernoulli$\left(e^{-distance}\right)$) Print($policy$)

## Acknowledgments

This material is based on research sponsored by DARPA through the U.S. Air Force Research Laboratory under Cooperative Agreement number FA8750-14-2-0004.

## Policy Learning through Probabilistic Inference

Policy is a probabilistic model:

require $agent, Instances, Policies.$ $instance$ $\gets$ Draw($Instances$) $policy$ $\gets$ Draw($Policies$) $cost$ $\gets$ Run($agent$, $instance$, $policy$) Observe(1, Bernoulli$(e^{-cost}))$ Print($policy$)

The log probability of the output policy is $$\log p_{\mathcal{P}}(policy) = -cost(policy) + \log p_{Policies}(policy) + C$$ When policies are drawn uniformly $$\log p_{\mathcal{P}}(policy) = - cost(policy) + C'$$

## Connection between MAP and Shortest Path

In a probabilistic program $\mathcal{P}$:

• $F_i$ is the prior distribution of random variable $x_i$.
• $G_j$ is the conditional distribution of observation $y_j$.

Maximizing the (logarithm of) trace probability $$\log p_{\mathcal{P}}(\pmb{x}) = \sum_{i=1}^{\left|\pmb{x}\right|} \log p_{F_i}(x_i) + \sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j}) + C$$ corresponds to finding the shortest path in a graph $G=(V,E)$:

• $V=\{(F_i, x_i)\} \cup \{(G_j, y_j)\}$.
• Edge costs are $-\log p_{F_i}(x_i)$ or $-\log p_{H_j} (y_j)$.

## Marginal MAP as Policy Learning

In Marginal MAP, assignment of a part of the trace $\pmb{x}^\theta$ is inferred.

Determining $\pmb{x}_{MAP}^\theta$ corresponds to learning a policy $\pmb{x}^\theta$ which minimizes the expected path length \begin{equation*} \pmb{x}^\theta=\arg \min_{\pmb{x}^\theta}\mathbb{E}_{\pmb{x}\setminus\pmb{x}^\theta}\left[-\sum_{i=1}^{\left|\pmb{x}^\theta\right|} \log p_{F_i^\theta}(x_i^\theta) -\sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j})\right] \end{equation*}

By constructing an appropriate probabilistic model, policies can be learned by probabilistic inference.

## Contributions

• Bilateral correspondence between probabilistic inference and policy learning for path finding.
• A new approach to policy learning.
• A realization of the approach for the Canadian traveller problem

## Future Work

• Online policy learning.
• Better (faster and more robust) inference algorithms for MMAP.
• Application to different planning domains.