David Tolpin, Jan Willem van de Meent,
Brooks Paige, Frank Wood
University of Oxford
Oxford, UK

Path Finding under Uncertainty through Probabilistic Inference


Preliminaries

Probabilistic Program

(let [;; Model
      dist (sample (categorical 
                    [[normal 1/2]
                     [gamma 1/2]]))
      a (sample (gamma 1 1))
      b (sample (gamma 1 1))
      d (dist a b)]
    ;; Observations
    (observe d 1) (observe d 2)
    (observe d 4) (observe d 7)
    ;; Explanation
    (predict :d (type d))
    (predict :a a) (predict :b b)))

Inference Objectives

Canadian Traveller Problem

CTP is a problem finding the shortest travel distance in a graph where some edges may be blocked.

Given

find the shortest travel distance from $s$ to $t$ — the sum of weights of all traversed edges.

References

  1. Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. A new approach to probabilistic programming inference. In AISTATS-2014.
  2. Bar-Noy, A., and Schieber, B. 1991. The Canadian trav- eller problem. In Proc. of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’91, 261–270.
  3. Eyerich, P.; Keller, T.; and Helmert, M. 2010. High-quality policies for the Canadian traveler’s problem. In Proc. of AAAI-2010.
  4. Shimony, S. E., and Charniak, E. 1991. A new algorithm for finding MAP assignments to belief networks. In Proc. of UAI ’90, 185–196.
  5. Kappen H.J., Gómez V., Opper M. Optimal control as a graphical model inference problem. Machine Learning, vol. 87, no. 2, pp. 159-182, 2012

Case Study: Canadian Traveller Problem

Policy

Depth-first search based policy:

Probabilistic Program

require CTP$(G,s,t,w,p)$ for $v \in V$ $policy(v)$ $\gets$ Draw(Dirichlet($\pmb{1}^{\deg(v)}$)) end for repeat $instance$ $\gets$ Draw(CTP$(G,w,p)$) ($reached$, $distance$) $\gets$ StDFS($instance$, $policy$) until $reached$ Observe(1, Bernoulli$\left(e^{-distance}\right)$) Print($policy$)

Learned Policies

 

 

Acknowledgments

This material is based on research sponsored by DARPA through the U.S. Air Force Research Laboratory under Cooperative Agreement number FA8750-14-2-0004.

Policy Learning through Probabilistic Inference

Policy is a probabilistic model:

require $agent, Instances, Policies.$ $instance$ $\gets$ Draw($Instances$) $policy$ $\gets$ Draw($Policies$) $cost$ $\gets$ Run($agent$, $instance$, $policy$) Observe(1, Bernoulli$(e^{-cost}))$ Print($policy$)

The log probability of the output policy is $$\log p_{\mathcal{P}}(policy) = -cost(policy) + \log p_{Policies}(policy) + C$$ When policies are drawn uniformly $$\log p_{\mathcal{P}}(policy) = - cost(policy) + C'$$

Connection between MAP and Shortest Path

In a probabilistic program $\mathcal{P}$:

Maximizing the (logarithm of) trace probability $$\log p_{\mathcal{P}}(\pmb{x}) = \sum_{i=1}^{\left|\pmb{x}\right|} \log p_{F_i}(x_i) + \sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j}) + C$$ corresponds to finding the shortest path in a graph $G=(V,E)$:

Marginal MAP as Policy Learning

In Marginal MAP, assignment of a part of the trace $\pmb{x}^\theta$ is inferred.

Determining $\pmb{x}_{MAP}^\theta$ corresponds to learning a policy $\pmb{x}^\theta$ which minimizes the expected path length \begin{equation*} \pmb{x}^\theta=\arg \min_{\pmb{x}^\theta}\mathbb{E}_{\pmb{x}\setminus\pmb{x}^\theta}\left[-\sum_{i=1}^{\left|\pmb{x}^\theta\right|} \log p_{F_i^\theta}(x_i^\theta) -\sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j})\right] \end{equation*}

By constructing an appropriate probabilistic model, policies can be learned by probabilistic inference.

Contributions

Future Work

http://www.offtopia.net/nips-pp-ws-2014-ardb-poster/