IJCAI 2011

A constraint satisfaction problem (CSP) is defined by:

**variables**$\mathcal{X}=\{X_1, X_2, ...\}$,**constraints**$\mathcal{C}=\{C_1, C_2, ...\}$.

- Each
*variable*$X_i$ has a non-empty domain $D_i$ of possible values. - Each
*constraint*$C_i$ involves some subset of the variables—the*scope*of the constraint—and specifies the allowable combinations of values for that subset. - An
*assignment*that does not violate any constraints is called*consistent*(or solution).

- A problem-solving agent can perform
*base-level*actions from a known set $\{A_i\}$. - Before committing to an action, the agent may perform a sequence of
*meta-level*deliberation actions from a set $\{S_j\}$. - At any given time there is a base-level action $A_\alpha$ that maximizes
the agent's
*expected utility.*

The **net VOI $V(S_j)$** of action $S_j$ is the intrinsic VOI $\Lambda_j$ less the cost of $S_j$:
\[V(S_j)=\Lambda(S_j)-C(S_j)\]

The **intrinsic VOI $\Lambda(S_j)$**
is the expected difference between the intrinsic expected utilities of the new
and the old selected base-level action, *computed after the meta-level
action is taken:*
\[\Lambda(S_j)=\mathbb{E}\left(\mathbb{E}(U(A_\alpha^j))-\mathbb{E}(U(A_\alpha))\right)\]

- $S_{j_{\max}}$ that maximizes the net VOI is performed: $j_{\max} = \arg\max_jV(S_j)$,
*if $V(S_{j_{\max}})>0$.* - Otherwise, $A_\alpha$ is performed.

- IMG4 Consortium under the MAGNET program of the Israeli Ministry of Trade and Industry
- Israel Science Foundation grant 305/09
- Lynne and William Frankel Center for Computer Sciences
- Paul Ivanier Center for Robotics Research and Production Management

CSP benchmarks from CSP Solver Competition 2005 were used. 14 benchmarks were solved for $\gamma=0$ and the exponential range $\gamma \in \{10^{-7},$ $10^{-6},$ $...,$ $1\}$, as well as with the minimum-conflicts heuristic and the pAC heuristic.

The maximum improvement is
achieved when *the solution count is estimated only in a small fraction of
occasions* selected using rational metareasoning.

Based on the results on benchmarks, we chose $\gamma=10^{-3}$, and applied it to two sets of 100 problem instances. Exhaustive deployment, rational deployment, the minimum conflicts heuristic, and the pAC heuristic were compared.

*The value of $\gamma$ chosen based on a small set of hard instances
gave good results on a set of instances with different parameters and
of varying hardness.*

- Real-world problem instances often have much more structure than random instances generated according to Model RB.
- We repeated the experiments on randomly generated Generalized Sudoku instances— a highly structured domain.
*Relative performance on Generalized Sudoku was similar to Model RB.*

- Heuristics are crucial tools in decreasing search effort in various areas of AI.
- A heuristic must provide useful information to the search algorithm, and be efficient to compute.
- Overhead of some well-known heuristics may outweigh the gain.
- Such heuristics should be deployed selectively, based on principles of
*rational metareasoning.*

- CSP backtracking search algorithms typically employ variable-ordering and value-ordering heuristics.
- Many value ordering heuristics are computationally
heavy, e.g. heuristics
*based on solution count estimates.* - Principles of rational metareasoning can be applied to decide when to deploy the heuristics.

Value ordering heuristics provide information about:

- $T_i$—the expected time to find a solution containing an assignment $X_k=y_{ki};$
- $p_i$—the probability that there is no solution consistent with $X_k=y_{ki}$.

The expected remaining search time in the subtree under $X_k$ for ordering $\omega$ is given by: \[T^{s|\omega}=T_{\omega(1)}+\sum_{i=2}^{|D_k|}T_{\omega(i)}\prod_{j=1}^{i-1}p_{\omega(j)}\]

- The current optimal base-level action is picking the $\omega $ which optimizes $T^{s|\omega}$. $T^{s|\omega}$ is minimal if the values are sorted by increasing order of $\frac {T_i} {1-p_i}$.
- The intrinsic VOI $\Lambda_i$ of estimating $T_i, p_i$ for the $i$th assignment is the expected decrease in the expected search time: $\Lambda_i=\mathbb{E}\left[T^{s|\omega_-}-T^{s|\omega_{+i}}\right]$.
- Computing new estimates (with overhead $T^c$) for values $T_i, p_i$ is beneficial just when the net VOI is positive: $V_i=\Lambda_i-T^c$.

The intrinsic VOI $\Lambda_i$ of invoking the heuristic can be approximated as: \[\Lambda_i\approx \mathbb{E}\left[(T_1-T_i)|D_k|\,\Big|\; T_i < T_1 \right]\]

- This work suggests a
*model for adaptive deployment*of value ordering heuristics in algorithms for constraint satisfaction problems. - As a case study, the model was applied to a value-ordering heuristic based
on solution count estimates, and a
*steady improvement*was achieved compared to**always**computing the estimates. - For many problem instances the optimum
performance is achieved when solution
*counts are estimated only in a small number of search states.*

- Generalization of the VOI to deploy different types of heuristics for CSP.
- Explicit evaluation of the quality of the distribution model, coupled with a better candidate model of the distribution.
- Application to search in other domains, especially
to heuristics for planning; in particular,
*examining whether the meta-reasoning scheme can improve reasoning over deployment based solely on learning.*