Course:CPSC522/Markov Logic

Markov Logic

Markov logic integrates both first-order logic (FOL) and Markov networks, bringing together logical and statistical AI.

Principal Author: May Young
Collaborators: David Johnson

Abstract

This page describes Markov logic as a combination of first-order logic and Markov networks. It also describes the graphical structure and parameters, and defines a Markov logic network. Furthermore, this page looks into two types of inference: find the most likely state of the world given some evidence, and compute conditional probabilities.

Builds on

Markov logic is a language that combines first order logic (FOL) (i.e. predicate calculus) and Markov networks, which allows both efficient handling of uncertainty and a compact representation of knowledge.

More general than

Markov logic is more general than definite clause grammars (DCGs) and probabilistic context-free grammars (PCFGs). Also, Markov logic networks (MLNs) can act like templates for Markov networks as MLNs have a set of constants that can be used to ground a Markov network. Note that to ground means to substitute a variable with a constant. Different groundings with different sets of constants would produce different networks. Furthermore, first-order logic is a special case of Markov logic when all weights are equal and tend to infinity.

Related Pages

Markov logic combines first order logic and Markov networks.

Content

Markov Logic

A first-order knowledge base (KB) is a set of hard constraints on the set of possible worlds. This means that if a possible world violates at least one formula, it has zero probability. On the other hand, Markov logic turns them into soft constraints so that when a world violates a formula, it becomes less probable but not impossible. The fewer formulas a world violates, the more probable it is. Each formula has a weight, which indicates how strong of a constraint it is. Hence, a KB in Markov logic is a set of first-order formulas with weights.

Additionally, a set of constants represents objects in the domain. Given these constants, a KB in Markov logic defines a probability distribution over possible worlds and uses a log-linear model. Thus, if a possible world satisfies a formula, its probability increases proportionally to a normalized exponentiated weighted combination of the weights of formulas that the world satisfies. In other worlds, if a possible world satisfies a formula, its log probability increases proportionally to the total formula weight.

P({\text{world}})\propto \exp(\Sigma {\text{weights of formula it satisfies}})

Which is equivalent to:

\log(P({\text{world}}))\propto (\Sigma {\text{weights of formula it satisfies}})

Graphical Structure

Example of a Markov Logic Network

A Markov logic network (MLN) is a set of pairs $(F,w)$ where $F$ is a formula in first-order logic and $w$ is a real number. With a finite set of constants $C$ , the MLN defines a Markov network such that:

There is one binary node for each grounding of each predicate in the MLN. The value of the node is 1 if the ground predicate is true, and 0 otherwise.
There is one feature/factor for each grounding of each formula $F$ in the MLN with the corresponding weight $w$ . The value of the feature is 1 if the ground formula is true, and 0 otherwise.

There is an edge/arc between two nodes if and only if the corresponding ground predicates appear together in at least one grounding of one formula in the MLN. Each state of the MLN is a possible world, containing a set of objects, functions (mappings from tuples of objects to objects), and relations between objects.

Parameters

For each formula $i$ , there is a factor $\Phi _{i}(pw)$ , where $pw$ is a possible world and $w_{i}$ is the weight of the formula $i$ .

\Phi _{i}(pw)=\exp(w_{i}f_{i}(pw))

Hence,

f_{i}(pw)={\begin{cases}1&{\text{if formula is true in pw}}\\0&{\text{otherwise}}\end{cases}}

Therefore, the probability of a possible world, that is, a ground Markov network, is the following:

P(pw)={\frac {1}{Z}}\exp {\Big (}\Sigma _{i}w_{i}n_{i}(pw){\Big )}={\frac {1}{Z}}\Pi _{i}\Phi _{i}(pw)

Inference

Inference in Markov logic allows probabilistic reasoning about complex relationships. Since an MLN is a template for a Markov network, we can use standard Markov network inference methods to answer probabilistic queries. However, this is often infeasible due to the size and complexity of the resulting network. Therefore, we combine probabilistic methods with ideas from logical inference, including satisfiability and resolution, which can then take advantage of the logical structure, leading to efficient methods.

We will consider two types of inference: find the most likely state of the world given some evidence, and compute conditional probabilities (e.g. probability of a formula).

MAP Inference

As a side note, MAP inference is also known as MPE inference in Bayesian network literature^[1].

The problem is to find the most likely state of the world given some evidence. Assume that the possible world $pw$ contains the evidence. Then:

\arg \max _{pw}P(pw)=\arg \max _{pw}{\frac {1}{Z}}\exp {\Big (}\Sigma _{i}w_{i}n_{i}(pw){\Big )}

Observe that the normalization constant $Z$ can be removed since it does not affect the $\arg \max$ operation. The exponentiation can also be removed because the exponential function is monotonic. Hence, this means we only need to find the truth assignment that maximizes the sum of weights of satisfied formulas/clauses.

\arg \max _{pw}\Sigma _{i}w_{i}n_{i}(pw)

Although the problem is NP-hard, this can be solved by any weighted satisfiability solver, either exact and approximate. The most commonly used approximate solver is MaxWalkSAT [cited in ^[1]], which performs a stochastic local search where it either flips the truth value of a randomly chosen atom in a randomly chosen unsatisfied clause, or the truth value of the atom that would maximize the sum of weights of satisfied clauses in the resulting possible world. The pseudocode is as below^[1]:

// Inputs:
// L, a set of weighted clauses
// max_tries, max number of tries
// max_flips, max number of flips
// target, target solution cost
// p, probability of taking a random step
//
// Output: 
// soln, best variable assignment found
function MaxWalkSAT(L, max_tries, max_flips, target, p)
    vars = variables in L
    for i = 1 to max_tries
        soln = a random truth assignment to vars
        cost = sum of weights of unsatisfied clauses in soln
        for j = 1 to max_flips
            if cost <= target
                return "Success, solution is", soln
            c = a randomly chosen unsatisfied clause
            
            // Uniform() returns a uniform number between [0,1]
            // DeltaCost() returns the change in the sum of weights of 
            // unsatisfied clauses due to flipping variable v
            if Uniform(0,1) < p
                var_to_flip = a randomly chosen variable from c
            else
                for each variable v in c
                    compute DeltaCost(v)
                var_to_flip = v with lowest DeltaCost(v)

            soln = soln with var_to_flip flipped
            cost = cost + DeltaCost(var_to_flip)
    return "Failure, best assignment is", best soln found

Computing Conditional Probabilities

The problem is to answer arbitrary queries in the form “What is the probability that formula $F$ holds given a Markov network defined by an MLN and a set of constants $C$ ?”, i.e. $P(F|M_{L,C})=?$ , where $F$ is a formula.

An obvious approach is to use brute force, where we find the total probabilities of all possible worlds where the formula holds. Let $M_{L,C}$ be a Markov logic network, and $PW_{F}$ is the set of all possible worlds in which $F$ is true. Then,

P(F|M_{L,C})=\Sigma _{pw\in PW_{F}}P(pw,M_{L,C})

Another approach is to use the Markov Chain Monte Carlo (MCMC) algorithm, where we sample worlds and check if the formula holds in each world sampled. Let $S$ be the set of all samples, and $S_{F}$ be the set of all samples/possible worlds in which $F$ is true.

P(F|M_{L,C})={\frac {|S_{F}|}{|S|}}

The brute force method is generally intractable (unless the domain is very small), while, despite being an approximation, the MCMC algorithm may be too slow for arbitrary formulas.

Instead, let’s consider the simplest (and less general) case that is also the most frequent type of query in practice:

$P({\text{ground literal}}|{\text{conjunction of ground literals}},M_{L,C})$

The main idea is to only build the minimal subset of the ground network required to answer the query. In other words, it is not necessary to create (ground) the part of the Markov network from which the query is independent given the evidence. It is true that, in the worst case, there can be no savings in efficiency, but in practice, this can greatly reduce the time and memory required for inference.

Once the (sub)network has been constructed, any standard inference technique for Markov networks can be applied, such as Gibbs sampling.

Example

The example is adapted from both Domingo and Lowd's example^[1]. as well as Giuseppe Carenini's example in one of the slides from the UBC CPSC 422 Lecture 30 PowerPoint^[2].

Figure 2. Corresponding Ground Markov Network of Example (Adapted from Domingo, Lowd, and Carenini)

Weights	Formulas
1.7	$\forall x\quad Study(x)\Rightarrow GetGoodGrades(x)$
0.5	$\forall x,\forall y\quad Tutor(x,y)\wedge GetGoodGrades(x)\Rightarrow GetGoodGrades(y)$

Let the set of constants be {Alex, Bailey}, or {A,B} for simplicity. Then we can construct the corresponding ground Markov network shown in Figure 2 by applying the formulas to the constants.

Recall that, in the equation for a factor, $f_{i}(pw)=1$ when the formula is true in $pw$ , and 0 otherwise. Therefore,

f(Study(x),GetGoodGrades(x))={\begin{cases}1&{\text{if }}Study(x)\Rightarrow GetGoodGrades(x)\\0&{\text{otherwise}}\end{cases}}

f(Tutor(x,y),GetGoodGrades(x),GetGoodGrades(y))={\begin{cases}1&{\text{if }}Tutor(x,y)\wedge GetGoodGrades(x)\Rightarrow GetGoodGrades(y)\\0&{\text{otherwise}}\end{cases}}

Then, applying the equation discussed earlier to find the probability of a possible world $pw$ , we get:

P(pw)={\frac {1}{Z}}\Pi _{i}\Phi _{i}(pw)={\frac {\exp(1.7)*\exp(0.5)*\exp(0)*exp(0.5)*exp(0.5)*\exp(1.7)}{1+2\exp(1.7)+3\exp(0.5)}}

Note that since $Tutor(A,B)\wedge GetGoodGrades(A)\Rightarrow GetGoodGrades(B)$ is false, this becomes $\exp(0)$ in the equation.

Annotated Bibliography

P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, 2009.

M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62 (1), 107-136.

G. Carenini. Lectures 30-32 [PowerPoint slides]. Retrieved from http://www.cs.ubc.ca/~carenini/TEACHING/CPSC422-17/index.html

References

↑ ^{Jump up to: 1.0} ^1.1 ^1.2 ^1.3 P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, 2009.
↑ G. Carenini. Lectures 30-32 [PowerPoint slides]. Retrieved from http://www.cs.ubc.ca/~carenini/TEACHING/CPSC422-17/index.html

To Add

Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.

Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0

[Domingos-1] {Jump up to: 1.0} ^1.1 ^1.2 ^1.3 P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, 2009.

[Carenini-2] G. Carenini. Lectures 30-32 [PowerPoint slides]. Retrieved from http://www.cs.ubc.ca/~carenini/TEACHING/CPSC422-17/index.html

[1]

[2]