Course:CPSC532:StaRAI2020:ExactInference
Exact Inference in Graphical Models
Exact Inference are used to infer the posterior probability of a query given some evidence.
Authors: Obada Alhumsi
Abstract
Exact inference methods take advantage of the factorization of graphical models, factor operations and graph properties in order to infer the posterior probability of a query given some evidence in a time/memory effective manner. Two prominent methods are variable elimination and recursive conditioning.
Related Pages
Exact Inference is often used to make predictions in Graphical Models such as Bayesian networks and Markov random fields. This page will use conditional probability tables to simplify explanations of the material.
Content
Exact Inference Methods
Inference approaches in graphical models can be split into four main categories: exact inference, variational inference, stochastic simulation, and bounded approaches. The focus of this page is exact inference, where two prominent methods, variable elimination, and recursive conditioning will be discussed in depth. All examples and explanations will be based on Bayesian networks, however, these methods are perfectly applicable in Markov random fields and even constraint satisfaction problems[1][2]. This means that the following exact inference methods can be applied to directed, undirected, cyclic and acyclic graphs.
Factors
Recall that in essence, a Bayesian network represents a factorization of joint probability distributions[3]. A factor is defined as a function applied on a tuple of variables to produce a number[3]. A factor on a variable can be written as , note that a factor can be applied on more than one variable. In exact inference methods for graphical models, we think of our graphical model as a product of factors[4]:
In Bayesian networks, these factors correspond to conditional probability distributions, while in Markov random fields, the factors correspond to an encoded unnormalized distribution[4]. A Baysian network can then be written as the following:
Given the former Baysian network, we could query the marginal probability of . The marginal probability of a variable can then be calculated using the following formula which leverages the factorization of the probability distribution [4]:
Factors Operations
Given the equation above, we require a method to solve the equation in a time and memory-efficient way(note that the variables can also have discrete value assignments, i.e ). The next sections will discuss such methods, however, we first need to cover three main factor operations used in these methods. These are variable assignment, factor multiplication, and factor marginalization. In order to explain these operations in a more effective manner, we will employ the use of conditional probability tables as seen in the figure below.
Assigning Variables
Let's first look at a factor on smoke, fire, and alarm discrete variables.
Smoke | Fire | Alarm | Val |
---|---|---|---|
t | t | t | 0.8 |
t | f | t | 0.2 |
f | f | t | 0.1 |
f | f | f | 0.9 |
We can now assign a value to a variable and find the factor values of that assigned variable. In this instance, let's assign smoke to be true, we'll see that we longer have the rows where smoke is false in our table, and we also lose the smoke column as we know its assignment.
Fire | Alarm | Val |
---|---|---|
t | t | 0.8 |
f | t | 0.2 |
Factor Multiplication
In factor multiplication, two or more factors can be multiplied together if they have a common factor, this will produce a new factor. We can write this mathematically as :
Fire | Alarm | Val | x | Smoke | Alarm | Val | = | Fire | Smoke | Alarm | Val |
---|---|---|---|---|---|---|---|---|---|---|---|
t | t | 0.9 | - | t | t | 0.9 | - | t | t | t | 0.9*0.9 |
t | f | 0.1 | - | f | f | 0.1 | - | t | f | f | 0.1*0.1 |
Factor Marginalization
In factor marginalization, we sum out a variable from a factor to produce a new factor that does not contain that variable. A simple example demonstrating this can be seen in the table below where we sum out the alarm variable(In entries where the other two variables are similar, but the alarm is different, we add both values and delete the alarm column).
Smoke | Fire | Alarm | Val | - | Smoke | Fire | Val |
---|---|---|---|---|---|---|---|
t | t | t | 0.9 | - | t | t | 0.9+0.05 |
t | f | t | 0.9 | - | t | f | 0.9+0.1 |
t | f | f | 0.1 | - | |||
t | t | f | 0.05 | - |
Variable Elimination
Suppose we have a Baysian network with variables ; we would now like to calculate the posterior probability of a variable or variables given some evidence on some variables (note that these variables are variables of the baysian network )[3] . We are thus trying to compute . Lets define the variables with no evidence as , where these elements are ordered according to an elemination ordering(elimination ordering is an NP hard problem that won't be discussed in this page[4]). We can then define:
Algorithm
It is now possible to define the variable elimination algorithm in a concise manner. In laymen's terms, we loop over the variables according to the elimination ordering, eliminating the variables as we go. Formally we can represent the algorithm as such[3]:
- Create a factor for each conditional probability distribution in our network
- Set the value of evidence variables to their observed values
- Marginlize(sum out) the non-observed variables according to our elimination ordering( We sum out our variable by partitioning the factors in our equation to those that contain the variable and those that dont. The factors that contain our variable are multiplied, and our variable is then summed out from this new factor.)
- Multiply the remaining factors and normalize
Two examples of this algorithm in action can be seen in the video below.:
Time Complexity
The running time of variable elimination is heavily dependent on the size and structure of our Bayesian network. In variable elimination, we are required to multiply factors to form a bigger factor over many variables. This relationship is exponential, meaning that if we are to multiply two factors to form a factor , it would require time to compute[4]. This clearly shows the importance of ordering as it will change the number of variables in the factors we are multiplying. However, this is an NP-hard problem, so we can settle for heuristic methods for now, such as min-neighbors, where we order variables by the minimum number of dependent neighbors.
In total, the running time of variable elimination is where n is the number of variables, M is the maximum size of any factor formed in our elimination process, and is the number of possible values each variable can take.[4]
Applications & Limitations
The main application of variable elimination in terms of graphical models is to find the posterior probability distribution of some query variables given some evidence. There are multiple limitations to the basic form of the algorithm. Firstly, as we do not know the most optimum way to reduce the number of variables in our factors during the elimination process, we cannot guarantee the optimum performance of the algorithm. Secondly, this algorithm is not capable of dealing with continuous variables as can be seen by the variable in the running time O-notation. Lastly, networks with a large number of interconnected variables will have large factors, making the algorithm exponentially slower.
It's important to note that both variable elimination and recursive conditioning are methods that reduce the algorithmic complexity of exact inference in graphical models. Thus, real-world applications are applications of graphical models. Nonetheless, as variable elimination is an exponential algorithm, its use is restricted for graphical models of small size(tree-width and memory requirements). These can include a medical prognosis given evidence from a few tests, and whether or not to open a sprinkler given weather forecast and other variables.
Recursive Conditioning
Recursive conditioning can be thought of as the search variant of variable elimination, both algorithms use the same factor operations, however, recursive conditioning takes a "divide and conquer" kind of approach[5]. If we have a factor graph with variables for nodes and factors for arcs between nodes, then if an assignment disconnects the graph into separate components, these components can be evaluated separately, this is shown in the figure below.
Lets assume that our goal is to compute the probability of node E, if this computation is made under assumption B, we can reduce the network to single connected network as seen in the middle layer of the figure above[6]. Furthermore, we also reduce the size of the computational probability table due to cutting that arc. Such assumptions taken will reduce the size of the network and is equivalent to the previous network for calculating the probability of node E[6]. This process is called recursive decomposition, and it reduces the computation of probabilities into single-node networks.
Algorithm
The algorithms takes in evidence and a set of factors (conditional probability distributions) as input, and outputs the value of summing out the other variables in the network[5]. In laymen's terms, the algorithm cuts the network where an assignment disconnects the graph, we then compute the of each separate network as in variable elimination and sum the results of the separate networks. The algorithm is more complex than what was just stated due to using a cache that saves already computed values and uses that memory to make the algorithm as optimum as possible.
Time Complexity
Comparing the algorithms together, recursive conditioning can take more time or less time than variable elimination, depending on the network properties and structure(time complexity can be linear or exponential). However, Recursive conditioning will take less space than variable elimination as it recalls, remembers, and forgets variables it's computed. Recursive conditioning offers a smooth tradeoff between the time of computation and the memory used[6].
Applications and Limitations
The main application of recursive conditioning in terms of graphical models is to find the posterior probability distribution of non-assigned variables given context and a set of factors. It is helpful in situations where memory could be an issue, as it can be used to trade-off speed and memory. In some configurations, recursive conditioning offers a linear time complexity which is better than variable elimination. However, as in variable elimination, recursive conditioning can have exponential computational complexity and can even perform worse than variable elimination.
Keeping in mind that recursive conditioning is a complexity reduction algorithm (as stated in the applications section of variable elimination), recursive conditioning can be used on graphical models of larger size given that they satisfy a few conditions(these conditions won't be stated as recursive conditioning was not fully explained). In the paper by Darwiche, recursive conditioning was used on genetic linkage analysis datasets, that are large (containing 9000 variables) and require a memory size of over 20 GB using variable elimination. Recursive conditioning was able to compute the probability of some evidence with respect to the network using a memory size of 112 MB.
Annotated Bibliography
- ↑ "Variable elimination, Wikipedia Page".
- ↑ "Variable Elimination, Artificial Intelligence: Foundations of Computational Agents, Poole & Mackworth".
- ↑ 3.0 3.1 3.2 3.3 "Variable Elimination for Belief Networks".
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 "Variable Elimination CS228 class notes".
- ↑ 5.0 5.1 "Recursive Conditioning, David Poole Lectures" (PDF).
- ↑ 6.0 6.1 6.2 Darwiche, Adnan. "Recursive conditioning" (PDF). Artificial Intelligence.
To Add
Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.
|