# Course:CPSC522/WeightedModelCounting

## Weighted Model Counting

An efficient approach to probabilistic inference is to reduce the problem to weighted model counting. The approach involves encoding of the probabilistic model usually a Bayesian network as a propositional knowledge base in conjunctive normal form(CNF) with weights associated with each model (assignment of variables) according to the network parameters. Here the term model refers to two different concepts. Given this CNF, computing the probability of some evidence is equivalent to calculating the sum of the weights of all assignments to CNF variables compatible with the evidence.

Principal Author: Hooman Hasehmi
Secondary Author:

## Abstract

This page provides a summary on weighted model counting and different approaches to it. The first section provides background, one of the variations and a simple example task. The next sections provide different ways to encode the probabilistic model here a Bayesian network, ways to count the models and finally what local structures can be exploited (that is what properties of parameter values of a specific probabilistic model can be exploited). The article ends by providing another variation of the problem and conclusion. This article is structured based on a survey [1] and tries to convey similar information.

### Builds on

Weighted model counting converts the model to conjunctive normal form (CNF). The probabilistic model here is a Bayesian network which is a subtype of graphical models.

### Related Pages

One of the counting methods relies on a d-DNNF knowledge compilation.

## Content

### Introduction and definition

#### Definitions

A Bayesian network represents a joint distribution as a pair of ${\displaystyle (G,P)}$, where ${\displaystyle G}$ is a acyclic graph representing dependencies and independences and ${\displaystyle P}$ is a set of factors. For each variable ${\displaystyle X}$ with parents ${\displaystyle U}$ there is a factor ${\displaystyle f\in P}$ which is a function such that for each assignment of these variables ${\displaystyle f(u,x)=Pr(x|u)}$. Followed by the chain rule and using independences we have,

${\displaystyle P(x_{1},x_{2},...,x_{n})=\prod _{i=1}^{n}P(x_{i}|u_{i})}$

In many cases a factor is described by a conditional porbablity table(CPT).

#### Example

Figure 1. A simple Bayesian network with three variables.

A simple example Bayesian network is provided in Figure 1. The joint probability and the conditional probability tables are provided bellow[1].

A B C Pr
${\displaystyle a_{1}}$ ${\displaystyle b_{1}}$ ${\displaystyle c_{1}}$ 0.001
${\displaystyle a_{1}}$ ${\displaystyle b_{1}}$ ${\displaystyle c_{2}}$ 0.002
${\displaystyle a_{1}}$ ${\displaystyle b_{1}}$ ${\displaystyle c_{3}}$ 0.007
${\displaystyle a_{1}}$ ${\displaystyle b_{2}}$ ${\displaystyle c_{1}}$ 0.009
${\displaystyle a_{1}}$ ${\displaystyle b_{2}}$ ${\displaystyle c_{2}}$ 0.018
${\displaystyle a_{1}}$ ${\displaystyle b_{2}}$ ${\displaystyle c_{3}}$ 0.063
${\displaystyle a_{2}}$ ${\displaystyle b_{1}}$ ${\displaystyle c_{1}}$ 0.0018
${\displaystyle a_{2}}$ ${\displaystyle b_{1}}$ ${\displaystyle c_{2}}$ 0.0162
${\displaystyle a_{2}}$ ${\displaystyle b_{1}}$ ${\displaystyle c_{3}}$ 0.162
${\displaystyle a_{2}}$ ${\displaystyle b_{2}}$ ${\displaystyle c_{1}}$ 0.0072
${\displaystyle a_{2}}$ ${\displaystyle b_{2}}$ ${\displaystyle c_{2}}$ 0.0648
${\displaystyle a_{2}}$ ${\displaystyle b_{2}}$ ${\displaystyle c_{3}}$ 0.648
A Pr
${\displaystyle a_{1}}$ 0.1
${\displaystyle a_{2}}$ 0.9
A B Pr
${\displaystyle a_{1}}$ ${\displaystyle b_{1}}$ 0.1
${\displaystyle a_{1}}$ ${\displaystyle b_{2}}$ 0.9
${\displaystyle a_{2}}$ ${\displaystyle b_{1}}$ 0.2
${\displaystyle a_{2}}$ ${\displaystyle b_{2}}$ 0.8
A C Pr
${\displaystyle a_{1}}$ ${\displaystyle c_{1}}$ 0.1
${\displaystyle a_{1}}$ ${\displaystyle c_{2}}$ 0.2
${\displaystyle a_{1}}$ ${\displaystyle c_{3}}$ 0.7
${\displaystyle a_{2}}$ ${\displaystyle c_{1}}$ 0.01
${\displaystyle a_{2}}$ ${\displaystyle c_{2}}$ 0.09
${\displaystyle a_{2}}$ ${\displaystyle c_{3}}$ 0.9

Using this model, the probability ${\displaystyle Pr(a_{1},c_{2})}$ can be calculated by summing the constitent rows ${\displaystyle 2}$ and ${\displaystyle 5}$ which gives ${\displaystyle 0.02}$.

The following is an example encoding of this probabilistic model as a CNF.

${\displaystyle \forall X\in \{A,B,C\}.\bigvee \limits _{i}\lambda _{x_{i}}}$
${\displaystyle \forall X\in \{A,B,C\},i,j.\lnot \lambda _{x_{i}}\lor \lnot \lambda _{x_{j}}}$
${\displaystyle \forall X\in \{A,B,C\},U=Pa(X),i,j.\lambda _{x_{i}}\land \lambda _{u_{j}}\iff \theta _{x_{i}|u_{j}}.}$

Or equivalently,
${\displaystyle \lambda _{a_{1}}\lor \lambda _{a_{2}}\qquad \lambda _{b_{1}}\lor \lambda _{b_{2}}\qquad \lambda _{c_{1}}\lor \lambda _{c_{2}}\lor \lambda _{c_{3}}}$
${\displaystyle \lnot \lambda _{a_{1}}\lor \lnot \lambda _{a_{2}}}$${\displaystyle \qquad \lnot \lambda _{c_{1}}\lor \lnot \lambda _{c_{2}}}$${\displaystyle \qquad \lnot \lambda _{c_{1}}\lor \lnot \lambda _{c_{3}}}$ ...
${\displaystyle \lnot \lambda _{a_{1}}\iff \theta _{a_{1}}\qquad \lambda _{b_{1}}\land \lambda _{a_{2}}\iff \theta _{b_{1}|a_{2}}\qquad \lambda _{c_{1}}\land \lambda _{a_{1}}\iff \theta _{c_{1}|a_{1}}}$ ...
Where the ${\displaystyle \lambda _{x_{i}}}$ is an indicator variable representing assignment of a value to a variable and ${\displaystyle \theta _{x|u}}$ is a parameter variable representing which conditional probability parameter is applied. Weight of the positive literal ${\displaystyle \theta _{x|u}}$ is ${\displaystyle Pr({x|u})}$, other literals have a weight of ${\displaystyle 1}$.

After assigning weights to each variable in the CNF obtained from the Bayesian network, the weight of a model(an assignment to these variables) is calculated by multiplying the individual literal weights. The weight of ${\displaystyle w=a_{1},b_{1},c_{2}}$ can be calculated as ${\displaystyle w=Pr(a_{1})Pr(b_{1}|a_{1})Pr(c_{2}|a_{1})\times 1\times 1...\times 1=0.1\times 0.1\times 0.2=0.002}$, which is equivalent to the joint probability.[1]

Without any evidence total weight of all models adds to one. To compute the mentioned probability ${\displaystyle Pr(a_{1},c_{2})}$, one should compute sum of weights of all models compatible with the statement ${\displaystyle a_{1}\land c_{2}}$ and the CNF obtained for the probabilistic model which is ${\displaystyle 0.002+0.018=0.02}$.

The advantages of this method are discussed in the next sections.

#### Formulation

We call the weights assigned to each literal ${\displaystyle l}$ by ${\displaystyle W(l)}$. As mentioned for the positive literals ${\displaystyle \theta _{x|u}}$, ${\displaystyle W(\theta _{x|u})=Pr(x|u)}$ and for all the other literals the weight is ${\displaystyle 1}$, that is ${\displaystyle W(\lnot \theta _{x|u})=W(\lambda _{x})=W(\lnot \lambda _{x})=1}$. The weights for the literals define a weight for each model ${\displaystyle w}$ as follows.

${\displaystyle W(w)=\prod _{w\models l}W(l)}$

For a logical theory ${\displaystyle \Delta }$, ${\displaystyle WMC(\Delta )}$ is computing the sum of all models of ${\displaystyle \Delta }$,

${\displaystyle WMC(\Delta )=\prod _{w\models \Delta }W(w)}$

Evidence ${\displaystyle e}$ is incorporated either by zeroing out weight of indicator variable literals that are inconsistent with ${\displaystyle e}$ or by computing ${\displaystyle WMC(\Delta \wedge \Delta _{e})}$, where ${\displaystyle \Delta _{e}}$ encodes evidence as a conjunction of indicator variables.[1]

### Encoding

There are different ways to encode the model as as a CNF, some take advantage of local structures and other reduce the number of variables of the simple encoding.

#### ENC1

This is a simple way to encode a Bayesian network(similar to the example?). First we describe how to produce the clauses.
1. Indicator clauses. For each network variable ${\displaystyle X}$ with domain ${\displaystyle x_{1},x_{2},...,x_{k}}$ we have clauses of the form,

${\displaystyle \lambda _{x_{1}}\lor \lambda _{x_{2}}\lor ...\lor \lambda _{x_{k}}}$
${\displaystyle \lnot \lambda _{x_{i}}\lor \lnot \lambda _{x_{j}},\quad {\text{for }}i

These clauses ensure that exactly one indicator variable for ${\displaystyle X}$is set to true.
2. ${\displaystyle ENC1}$ parameter Clauses. For each parameter ${\displaystyle Pr(x_{i}|u_{1},u_{2},...,u_{n})}$, where ${\displaystyle u_{j}}$ is any assignment to the ${\displaystyle j}$th parrent of ${\displaystyle x_{i}}$ we generate the following clauses,

${\displaystyle \lambda _{u_{1}}\land \lambda _{u_{2}}\land ...\land \lambda _{u_{n}}\land \lambda _{x_{i}}\iff \theta _{x_{i}|u_{1},u_{2},...,u_{n}}.}$

These clauses ensure that the parameter variable is set to true if and only if the corresponding indicator variables are true. The mentioned encoding does not take advantage of parameter values (local structures), but using this it is easy to utilize determinism with small modification (determinism is what follows).

In the scenario that weight of one of the parameter variables is ${\displaystyle 0}$, say ${\displaystyle \theta _{c_{1}|a_{1}}}$ in the first example. By excluding models that set this parameter variable to true, the corresponding parameter clauses will simplify to ${\displaystyle \lnot \lambda _{a_{1}}\lor \lnot \lambda _{c_{1}}}$, the redundant variable ${\displaystyle \theta _{c_{1}|a_{1}}}$can be eliminated from the encoding.

#### ENC2

This encoding simplifies the previous one. For each network variable ${\displaystyle X}$ with values ${\displaystyle x_{1},x_{2},...,x_{k}}$, it requires an ordering over the values of the variable specified by ${\displaystyle x' if ${\displaystyle x'}$comes before ${\displaystyle x}$. Assuming ${\displaystyle \rho _{x_{i}|u}}$ are the new parameter variables, the weight associated to these variables when positive is the probability that ${\displaystyle X=x_{i}}$ given that ${\displaystyle X\neq x_{j} and ${\displaystyle u}$ holds that is,

${\displaystyle W(\rho _{x_{i}|u})=Pr(x_{i}|u,\lnot x_{1},\lnot x_{2},...,\lnot x_{i-1})={\frac {Pr(x_{i}|u)}{1-Pr(x_{i-1}|u)-...-Pr(x_{2}|u)-Pr(x_{1}|u)}},}$

and ${\displaystyle W(\lnot \rho _{x_{i}|u})=1-W(\rho _{x_{i}|u})}$ when negative. Given this we have,

${\displaystyle W(\lnot \rho _{x_{1}|u})W(\lnot \rho _{x_{2}|u})...W(\lnot \rho _{x_{i-1}|u})W(\rho _{x_{i}|u})=Pr(x_{i}|u),}$

The encoding takes advantage of this and utilizes the ability to render ${\displaystyle \rho _{x_{j}|u}}$as don't care variables when ${\displaystyle x_{j}>x_{i}}$ or ${\displaystyle u}$ does not hold so their weights do not affect the model counting. The indicator clauses are the same, for a given parameter ${\displaystyle Pr(x_{i}|u_{1},...,u_{n})}$ the parameter clauses are modified to the following clause,

${\displaystyle \lambda _{u_{1}}\land ...\land \lambda _{u_{n}}\land \lnot \rho _{x_{1}|u_{1},..,u_{n}}\land ...\land \lnot \rho _{x_{i-1}|u_{1},..,u_{n}}\land \rho _{x_{i}|u_{1},..,u_{n}}\implies \lambda _{x_{i}}.}$

Notice that because of determinism ${\displaystyle \rho _{x_{k}|u}}$ can be eliminated and the clause can be,

${\displaystyle \lambda _{u_{1}}\land ...\land \lambda _{u_{n}}\land \lnot \rho _{x_{1}|u_{1},..,u_{n}}\land ...\land \lnot \rho _{x_{k-1}|u_{1},..,u_{n}}\implies \lambda _{x_{k}}.}$

Furthermore, there is no if and only if relationship so more than one of ${\displaystyle \rho _{x_{i}|u}}$ can be true in a consistent model. That is how the mentioned variables are considered as don't care.

One of the differences is that, ${\displaystyle ENC2}$ produces smaller CNFs with fewer parameters and clauses. Another difference is that for each instantiation of network variables ${\displaystyle w}$, set of consistent models ${\displaystyle \Omega }$ can have more than one element where sum of their weights is equal to joint probability of ${\displaystyle w}$.

#### Similarity and differences

${\displaystyle ENC1}$ is similar to the example, ${\displaystyle ENC2}$ differences are mentioned in the end of ${\displaystyle ENC2}$ section. The difference in encoding is that the parameters are ordered and only the first parameter variable that is equal to true matters.

#### ENC3 and ENC4

${\displaystyle ENC3}$ takes advantage of equal weights of parameters to reduce the number of CNF variables. The method ${\displaystyle ENC4}$ tries to remove irrelevant variables to make finding decompositions easier. The encodings are provided in the local structure section.

### Model counting

There are different ways to compute the weighted sum. Here model counting using search and knowledge compilation is discussed.

#### Search

The search is based on a series of decomposing and splitting the CNF into smaller subproblems.

1. Decomposing. When the CNF can be decomposed into two set of clauses that do not share variables, one can solve each of the sub problems separately because any two consistent model in the subproblems give a consistent model in the original problem and by multiplying the ${\displaystyle WMC}$ of the subproblems the weighted sum of original problem can be obtained.
2. Splitting based on a CNF variable ${\displaystyle X}$. When there is no way to decompose the CNF, one can count the models in which an arbitrary variable ${\displaystyle X}$ is true or false separately. After assuming value of ${\displaystyle X}$ and simplifying the clauses, two new problems with fewer variables can be solved. The sum of weights for the original problem is sum of the ${\displaystyle WMC}$ of each of subproblems.[1]

An example of the search algorithm is provided in Figure 2.

Figure 2. An example of the search algorithm

#### Knowledge Compilation

The method based on knowledge compilation, compiles the CNF into a smooth d-DNNF. The traces generated by search algorithms similar to the previous section can be interpreted as members of the d-DNNF language. The properties of d-DNNFs are briefly explained here.[1]

###### Smooth d-DNNF

A d-DNNF(deterministic Decomposable Negation Normal Form) is a rooted DAG with conjunction and disjunction as nodes and literals as leaves. The following are the properties of a smooth d-DNNF.

1. Decomposability. Conjunctions can not share variables.
2. Determinism. Disjunctions must be logically disjoint (not true at the same time).
3. Smoothness. Disjunctions mention the same set of variables. Here our d-DNNF is also smooth.
###### Model counting and arithmetic circuit

The approach further converts the d-DNNF into an arithmetic circuit(AC), leading to a WMC circuit as in Figure 3. The circuit explicitly expresses the ${\displaystyle WMC}$ as summations and multiplications of subproblems similar to the search algorithm.[1]

Figure 3. Converting a CNF to a d-DNNF and then to an AC. The arithmetic circuit is created by replacing conjunctions with multiplications and disjunctions with addition.

### Local structures

In the previous sections, encoding determinism was discussed. In this section we discuss other methods to take advantage of structure in ${\displaystyle ENC1}$ like, equal parameters, decomposability and evidence.

#### Equal parameters

The first method to take advantage of structure is to remove parameter variables that have similar value in the same conditional probability table (in the same factor), that is use the same boolean variable ${\displaystyle \theta }$ to reduce number of variables (e.g. ${\displaystyle \theta _{c2|a1,b1}=0.5}$ and ${\displaystyle \theta _{c3|a1,b2}=0.5}$ if both share the same value). However this does not directly work for ${\displaystyle ENC1}$, for example the two if and only if statements sharing ${\displaystyle \theta }$ and two different assignment to the same variable imply that ${\displaystyle \theta }$ can't be true. The encoding ${\displaystyle ENC3}$ uses the following clauses to solve this problem.

${\displaystyle ENC3}$ parameter clauses,

${\displaystyle \lambda _{u_{1}}\land \lambda _{u_{2}}\land ...\land \lambda _{u_{n}}\land \lambda _{x_{i}}\implies \theta _{x_{i}|u_{1},u_{2},...,u_{n}}.}$

This ensures that always at least one of the ${\displaystyle \theta _{x_{i}|u_{1},u_{2},...,u_{n}}}$ parameters is true, but does not imply that there should be a unique assignment. This causes more models to be included than ${\displaystyle ENC1}$, and ${\displaystyle WMC}$ to be larger than it's actual value. This problem is solved by a minimization operation, that is exclude any model that sets more than two ${\displaystyle \theta _{x_{i}|u_{1},u_{2},...,u_{n}}}$ to true since always only one is enough to satisfy the implications.[1]

#### Decomposability

The encoding ${\displaystyle ENC4}$ tries to remove irrelevant variables and make clauses more decomposable. The irrelevant variables are defined as the variables whose values does not affect the probability (that is the factor value in the CPT is uniform over literals of that variable). Similar to conditional Independence, this type of independence caused by removing these variables allows decomposition.

To explain the modifications necessary to ${\displaystyle ENC3}$, first we need to define new concepts and an algorithm to simplify the encoding. Since we are working with a set of multivariate variables ${\displaystyle X}$,

we need to utilize a logic over multi-valued variables. The generalization is straightforward each world which consists of an atom of each variable satisfies an atom if and only if it assigns the common variable the same value. A term over ${\displaystyle X'}$ is a conjunction of atoms one for each variable of ${\displaystyle X'}$. Let ${\displaystyle \Gamma }$ be a disjunction of terms over ${\displaystyle X}$. An implicant ${\displaystyle \gamma }$ of ${\displaystyle \Gamma }$ is a term over ${\displaystyle X'}$ that implies ${\displaystyle \Gamma }$. A prime implicant is an implicant which is minimal, that is removing any atom results in a term that is not an implicant.

The algorithm to simplify ${\displaystyle ENC3}$ works as follows, first we divide the CNF clauses into encoding groups which share the same parameter values (or are false). Then we encode each encoding group ${\displaystyle \Gamma }$ separately. For each group we first find the prime implicants using the algorithm(todo ref) and then for each implicant ${\displaystyle p}$ we add a clause of the form ${\displaystyle I\implies \theta }$. Where ${\displaystyle I}$ is the conjunction of the indicators and ${\displaystyle \theta }$ is the consequent of the encoding group. The removal of literals allows more decomposition.

The ${\displaystyle ENC4}$ improves both time and size (number of edges) of the resulting compilation.[1]

#### Evidence

Evidence is another local structure that WMC takes advantage of, it is encouraged to refer to the original article[1] to see the methods used for using evidence efficiently.

### Other problem variations

#### Continuous domain

For continues domains it is possible to encode factors that have only one continues variable either as a parent or child. The CNF is created by terms like "x < value" and inference tasks similar to exact inference are done by approximating integration over the real domain by modeling it as a set of polynomial integrations. [2]

## Annotated Bibliography

1. M. Chavira and A. Darwiche, On probabilistic inference by weighted model counting. Artificial Intelligence, 2018.
2. V. Belle and A. Passerini and G. Van den Broeck, Probabilistic Inference in Hybrid Domains by Weighted Model Integration. AAAI, 2015.

Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.

Local structures (cont.)

#### Evidence(important)

Exploiting evidence can make inference in a Bayesian network more tractable. Two of the most common technics are removing leaf nodes and removing edges outgoing from observed notes. We call these preprocess the classical pruning can decrease connectivity of the network. The work (ref) exploits evidence in a way which provide much more benefits.

The second method conjoins unit clauses encoding the evidence prior to compiling and therefor eliminating them from theory. This makes compilation more tractable and the result much smaller. The disadvantage is that queries only superset of current evidence can be answered.

From the practical situations that this algorithm can be applied, first the evidence may be fixed only a subset of variables (like MAP algorithms). Second, one may be interested in the actual values of the network parameter which will maximize the probability of given evidence(this for example happens in genetic linkage analysis). In this scenario one may want to use iterative algorithms like EM or gradient descent, which is done with many network queries with the same evidence but different network parameter values. A similar application appears in sensitivity analysis , where the goal is to search for some network parameters that satisfy a given constraint. The method is simple, just for each ${\displaystyle x}$ in the evidence assert the unit clause ${\displaystyle \lambda _{x}}$into the CNF. However the effect of this seemingly innocent action belie it's true power. Several detailed examples are provided in(ref), which shows that how it can reduce work of compiler algorithm.

The example is from genetic linkage analysis, and is a common occurrence in that domain. It involves four variables: child C with parents A, B, and S.

The variable C is the genotype in a child which is inherited from one of the parent’s genes, A/B, based on the value of selector S. We assume that all four variables are binary and that the portion of the CPT with S = s1 is as follows.

S A B C Pr(C|A,B)
s1 a1 b1 c1 1.0
s1 a1 b2 c1 1.0
s1 a2 b1 c2 1.0
s1 a2 b2 c2 1.0

As described in Section 4.1, the algorithm on which the compilation is based works by repeatedly conditioning to decompose the CNF. Let us consider the case where we are given evidence {c1}, and during compilation, we condition on S = s1. Assuming a proper encoding of the network into CNF, combining the evidence with the value for S allows us to infer a1, which unit resolution can use to achieve further gains. Conditioning on S = s2 yields a similar conclusion for b1. In this case, the full power of conditioning on S is realized only when combined with evidence on C. This example reveals how evidence can combine with the operations of the compilation algorithm to simplify the task.

Recall that classical pruning severs edges leaving evidence nodes and deletes certain leaf nodes. Injecting unit clauses is analogous to this severing of edges but is strictly more powerful for several reasons. First, this technique not only exploits the fact that a variable has been instantiated, but also exploits the specific value to which it has been instantiated. Second, rather than simply affecting the CPTs of children of evidence nodes, injecting unit clauses can affect many more parts of the network since unit clauses will often allow the WMC algorithm to infer additional unit clauses, and the effects can propagate to many ancestors and many descendants of evidence nodes. Third, rather than only realizing a limited number of gains during initialization, injecting unit clauses can continue to realize gains throughout the WMC algorithm.

Several results are given in [2] demonstrating large gains when compiling with evidence. Algorithms that exploit only topological structure could not perform inference on many of the data sets, even after performing classical pruning, because of high treewidth. Furthermore, in the majority of cases, applying classical pruning and compiling the CNF without the introduction of the unit clauses based on the evidence also failed. However, with the introduction of the unit clauses, compilation became possible in many cases. Moreover, the paper showed that the performance of this general technique subsumed the performance of the specialized quickscore algorithm [13], which capitalizes on evidence in certain types of diagnostic networks. Finally, the paper showed that when combined with some aggressive preprocessing and applied to several difficult problems from genetic linkage analysis, the technique outperformed SUPERLINK 1.4, a state-of-the-art system for the task, on a number of challenging problems.

Table 16 lists a few of the results from the paper from the field of genetic linkage analysis and compares the performance to that of SUPERLINK. There are several observations. First, general-purpose algorithms that exploit only topological structure such as jointree could solve only one of the listed networks, because of high treewidth, even after applying classical pruning techniques. Second, only one of these networks could be compiled without the introduction of unit clauses to capture evidence. However, once the unit clauses were injected, all of the networks yielded to compilation in minutes. Finally, WMC compilation times are in most cases more efficient than SUPERLINK online times, and WMC online times are much more efficient still. Given that compilation must occur once, and online inference must be repeated many times, this effect of this improvement multiplies.

Table 16 Net, Max clust. ee33 20.2 ee37 29.6 ee30 35.9 ee23 38.0 ee18 41.5
Comp. time (s) 25.33 61.29 376.78 89.47 283.96
Comp. size 2,070,707 1,855,410 27,997,686 3,986,816 23,632,200
Online time (s) 0.59 0.39 8.37 1.08 6.63
SUPERLINK time (s) 1046.72 1381.61 815.33 502.02 248.11

Table 17 ACE vs. Jointree when there is no local structure. Online time is averaged over sixteen evidence sets, where for each evidence set, we compute probability of evidence and a posterior marginal for every network variable
Network A C E offline time (s)
alarm 1 bm-5-3 721 diabetes 1345 hailfinder 3 mm-3-8-3 195 munin2 284 munin3 254 munin4 1248 pathfinder 37 pigs 41 students-3-2 241 tcc4f.obfuscated 3 water 340
Jointree avg. online time (s)
0.007 3.328 1.202 0.018 1.117 0.764 0.495 1.770 0.062 0.115 0.961 0.022 0.659
ACE avg. Improv. online time (s)
0.005 1.41 3.965 0.84 1.268 0.95 0.007 2.66 1.336 0.84 0.596 1.28 0.534 0.93 1.872 0.95 0.036 1.72 0.123 0.93 1.806 0.53 0.007 3.17 0.591 1.12

### Other problem variations and conclusion

#### Continuous domain (maybe useful)

For continues domains it is possible to encode factors that has only one continues variable (either as a parent or child, the CNF is created by terms like "x < value" and the exact inference is done by approximating(?) integration over the real domain by modeling it as a set of polynomial integrations)

Probabilistic inference in hybrid domains by weighted model integration. V Belle, A Passerini, G Van den Broeck. Proceedings IJCAI 2015

Conclusion

We conclude by noting that ENC2, ENC3, and ENC4 can effectively take advantage of evidence in the way described in this section, since they utilize indicator variables in the same way as ENC1. Furthermore, performing WMC by search can utilize evidence by examining the weights of variables and asserting a negative unit clause any time a weight is equal to 0. In the case of compilation, the disadvantage of incorporating evidence was that compilation would need to be performed again for some queries, which removes one of the chief advantages of compiling (although we have seen that in many practical cases, this is not necessary). However, in the case of a search, the algorithm is re- run for each new evidence anyway, so there is really no disadvantage to incorporating evidence in this case. Encoding evidence in the context of search algorithms was indeed applied effectively in [1]

References

1. [1]  M. Chavira, A. Darwiche, Compiling Bayesian networks with local structure, in: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005, pp. 1306–1312.
2. [2]  M. Chavira, D. Allen, A. Darwiche, Exploiting evidence in probabilistic inference, in: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), 2005, pp. 112–119.
3. [3]  M. Chavira, A. Darwiche, Encoding cnfs to empower component analysis, in: Proceedings of the 9th International Conference on Theory and Applications of Satisfiability Testing (SAT), in: Lecture Notes in Computer Science, vol. 4121, Springer, Berlin, Heidelberg, 2006, pp. 61–74.
4. [4]  F.V. Jensen, S. Lauritzen, K. Olesen, Bayesian updating in recursive graphical models by local computation, Computational Statistics Quar- terly 4 (1990) 269–282.
5. [5]  S.L. Lauritzen, D.J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems, Journal of Royal Statistics Society, Series B 50 (2) (1988) 157–224.
6. [6]  N.L. Zhang, D. Poole, Exploiting causal independence in Bayesian network inference, Journal of Artificial Intelligence Research 5 (1996) 301–328.
7. [7]  R. Dechter, Bucket elimination: A unifying framework for probabilistic inference, in: Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence (UAI), 1996, pp. 211–219.
8. [8]  A. Darwiche, Recursive conditioning, Artificial Intelligence 126 (1–2) (2001) 5–41.
9. [9]  F. Jensen, S.K. Andersen, Approximations in Bayesian belief universes for knowledge based systems, in: Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI), Cambridge, MA, 1990, pp. 162–169.
10. [10]  C. Boutilier, N. Friedman, M. Goldszmidt, D. Koller, Context-specific independence in Bayesian networks, in: Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence (UAI), 1996, pp. 115–123.
11. [11]  D. Larkin, R. Dechter, Bayesian inference in the presence of determinism, in: C.M. Bishop, B.J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Jan 3–6, 2003, Key West, FL.
12. [12]  F. Bacchus, S. Dalmao, T. Pitassi, Value elimination: Bayesian inference via backtracking search, in: Proceedings of the 19th Annual Confer- ence on Uncertainty in Artificial Intelligence (UAI-03), Morgan Kaufmann Publishers, San Francisco, CA, 2003, pp. 20–28.
13. [13]  D. Heckerman, A tractable inference algorithm for diagnosing multiple diseases, in: Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence, 1989, pp. 174–181.
14. [14]  D. Poole, N. Zhang, Exploiting contextual independence in probabilistic inference, Journal of Artificial Intelligence 18 (2003) 263–313.
15. [15]  A. Darwiche, A logical approach to factoring belief networks, in: Proceedings of KR, 2002, pp. 409–420.
16. [16]  T. Sang, P. Beame, H. Kautz, Solving Bayesian networks by weighted model counting, in: Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), vol. 1, AAAI Press, 2005, pp. 475–482.
17. [17]  M. Chavira, A. Darwiche, Compiling Bayesian networks using variable elimination, in: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007, pp. 2443–2449.
18. [18]  T. Sang, F. Bacchus, P. Beame, H.A. Kautz, T. Pitassi, Combining component caching and clause learning for effective model counting, in: SAT, 2004.
19. [19]  T. Sang, P. Beame, H.A. Kautz, Heuristics for fast exact model counting, in: SAT, 2005, pp. 226–240.
20. [20]  A. Darwiche, On the tractability of counting theory models and its application to belief revision and truth maintenance, Journal of Applied Non-Classical Logics 11 (1–2) (2001) 11–34.
21. [21]  J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1988.
22. [22]  M. Chavira, A. Darwiche, M. Jaeger, Compiling relational Bayesian networks for exact inference, International Journal of Approximate Reasoning 42 (1–2) (May 2006) 4–20.
23. [23]  D. Roth, On the hardness of approximate reasoning, Artificial Intelligence 82 (1–2) (1996) 273–302.
24. [24]  F. Bacchus, S. Dalmao, T. Pitassi, Algorithms and complexity results for #sat and Bayesian inference, in: FOCS, 2003, pp. 340–351.
25. [25]  M. Davis, G. Logemann, D. Loveland, A machine program for theorem proving, CACM 5 (1962) 394–397.
26. [26]  A. Darwiche, New advances in compiling CNF to decomposable negational normal form, in: Proceedings of European Conference on Artificial Intelligence, 2004, pp. 328–332.
27. [27]  R. Dechter, R. Mateescu, AND/OR search spaces for graphical models, Artificial Intelligence 171 (2–3) (2007) 73–106.
28. [28]  W. Wei, B. Selman, A new approach to model counting, in: SAT, 2005, pp. 324–339.
29. [29]  W. Wei, J. Erenrich, B. Selman, Towards efficient sampling: Exploiting random walk strategies, in: AAAI, 2004, pp. 670–676.
30. [30]  R. Bayardo, J. Pehoushek, Counting models using connected components, in: AAAI, 2000, pp. 157–162.
31. [31]  A. Darwiche, A compiler for deterministic, decomposable negation normal form, in: Proceedings of the Eighteenth National Conference on

Artificial Intelligence (AAAI), AAAI Press, Menlo Park, CA, 2002, pp. 627–634.

M. Chavira, A. Darwiche / Artificial Intelligence 172 (2008) 772–799 799

1. [32]  F. Bacchus, S. Dalmao, T. Pitassi, Dpll with caching: A new algorithm for #SAT and Bayesian inference, Electronic Colloquium on Compu- tational Complexity (ECCC) 10 (003).
2. [33]  A. Darwiche, P. Marquis, A knowledge compilation map, Journal of Artificial Intelligence Research 17 (2002) 229–264.
3. [34]  J. Huang, A. Darwiche, Dpll with a trace: From sat to knowledge compilation, in: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005, pp. 156–162.
4. [35]  M. Jaeger, Relational Bayesian networks, in: D. Geiger, P.P. Shenoy (Eds.), Proceedings of the 13th Conference of Uncertainty in Artificial Intelligence (UAI-13), Morgan Kaufmann, Providence, USA, 1997, pp. 266–273.
5. [36]  A. Darwiche, A differential approach to inference in Bayesian networks, Journal of the ACM 50 (3) (2003) 280–305.
6. [37]  J.P. Hayes, Introduction to Digital Logic Design, Addison Wesley, 1993.
7. [38]  M.M. Mirsalehi, T.K. Gaylord, Logical minimization of multilevel coded functions, Applied Optics 25 (1986) 3078–3088.
8. [39]  R.D. Shachter, Evaluating influence diagrams, Operations Research 34 (6) (1986) 871–882.
9. [40]  S. Ross, Evidence absorption and propagation through evidence reversals, in: Proceedings of the 5th Annual Conference on Uncertainty in Artificial Intelligence (UAI-90), Elsevier Science Publishing Company, Inc., New York, 1990.
10. [41]  J. Park, A. Darwiche, Approximating MAP using stochastic local search, in: Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI), Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2001, pp. 403–410.
11. [42]  J. Park, A. Darwiche, Solving map exactly using systematic search, in: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI), 2003, pp. 459–468.
12. [43]  C. Yuan, T.-C. Lu, M. Druzdzel, Annealed MAP, in: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI), 2004, pp. 628–635.
13. [44]  M. Fishelson, D. Geiger, Exact genetic linkage computations for general pedigrees, Bioinformatics 18 (1) (2002) 189–198.
14. [45]  M. Fishelson, N. Dovgolevsky, D. Geiger, Maximum likelihood haplotyping for general pedigrees, Tech. Rep. CS-2004-13, Technion, Haifa, Israel, 2004.
15. [46]  A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Soci- ety 39 (1) (1977) 1–38.
16. [47]  H. Chan, A. Darwiche, Sensitivity analysis in Bayesian networks: From single to multiple parameters, in: Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI), AUAI Press, Arlington, Virginia, 2004, pp. 67–75.
17. [48]  J. Park, A. Darwiche, A differential semantics for jointree algorithms, Artificial Intelligence 156 (2004) 197–216.

1. Cite error: Invalid <ref> tag; no text was provided for refs named WMC