Course:CPSC522/Adverserial Belief Propagation
This page models belief propagation in UGMs with complex factors which are adversarially learned from data.
Principal Author: Hooman Hashemi
Discriminators in GANs define distributions. In the first part of this article a concrete method is provided to explore this distribution as a single monolithic factor. In the second part, ways to combine the factors into more complex undirected graphical models is mentioned. While discriminator defines a complex factor table, generator helps to explore this factor as a proposal distribution. In this model, as the factor is explored proposal distribution becomes more similar to the Discriminator distribution. It is intended to use the latent state of the generator in converged state as messages between factors.
In undirected graphical models, exact local marginals (or beliefs) can be computed by eliminating (or marginalising) variables one by one. This operation forms a tree like structure of passed marginals here called messages and the operation is called belief propagation. The problem with exact inference is the increasing size of messages and cliques, after eliminating a variable all adjacent variables become dependent forming large cliques. Calculating messages is done by marginalising and iterating all possible assignments to these cliques. So the time complexity is exponentially growing with the clique sizes.
Loopy Belief Propagationn
Loopy BP uses the same updates, but in a graph with cycles instead of a clique-tree. It is no longer interpreted as marginalisation but instead an attempt to minimize an approximation to KL-Divergence of the distribution defined by messages and the distribution defined by the factors. Clique sizes can be smaller.
We first explain how to use a discriminator as a complex factor table, in a single factor setting via an adversarial search.
- Using discriminator feed-backs, converge the proposal distribution to discriminator's distribution (an alternative to finding best fitting marginals).
- In this implementation, runtime is similar to training of a GAN. In each iteration generator (proposal) is updated based on feedback of discriminator. The discriminator adopts to the new state without training(by just conditioning on the state).
( is the runtime state not the parameters of network)
( is maximized when , here for a given , apply this for and )
After adding to the , the model seemed unstable at first, but might not always be. (see stability)
: encoding of .
Instead of the ratio or loss themselves can be given, based on applicatoin.
Attention and associative memory can be added, It will help to avoid local minimas
Stability, convergence and updates
- First for a given , optimize toward , by repeatedly sampling a sequence of from using the recurrent model defined by and the information from iteration , then sampling from by using the model with the parameters , calculating the loss and back-propagating. Repeat enough, by assumption should produce a subset of the argmax of with a high probability.
- Take a small step from toward in a similar manner, should learn the ratio between and the actual . Repeat and . Note that from only the recurrent part is updated so far. The minimizer for should be close to for all high probably , and should be close to the for all such .
- Take a small step from toward minimizing the objective, this should minimize the expectation of loss over . Since for high probably the loss is close to the we have,
I'm assuming minimizing the expectation over the argmax should minimize the maximum which is the JSD,repeat , and .
For a given Discriminator, recurrent part () learns to produce converged states (close to peaks), then discriminator is updated to be close to or . Finally is updated to minimise the expected .
Single Factor Example
Every planar map has a Four Coloring. It is NP-complete to decide whether a planar map has a Three Coloring. Given image of a map , generator can produce random colouring's which is easily verifiable by discriminator if it is a three colouring. Exhaustive search is possible with adversarial search over latent features, here tile colors.
Many of NP-complete problems can be good examples for adversarial search. Another example is finding or highlighting a Hamiltonian cycle in a graph specified by image .
Many of real world problems have underlying NP-complete problems.
Advantages (Single factor)
The ultimate goal of this research is to be able to perform complex chained integrations similar to message passing in a graphical model.
The following advantages are the ultimate goal of this research, although not yet achieved I believe that they are realisable via this approach.
- Chained integration. Similar to finding a posterior in a hierarchical Bayes model but in an undirected manner. This is possible by only switching factors by conditioning on different questions. Models fitted in other questions or factors can be used as incoming messages to solve new questions or factors.
- Unsupervised learning of diverse factors. Although it is possible to perform hierarchical Bayes inferences, It is not straight forward to see or learn what are the factors. In this model adding diversity to the factors may be as easy as putting attention on different subset of data.(though both these advantages are problems need to be solved).
Further research and Problems
The following are problems that need to be solved.
- Factors. To design problems or factors unsupervisedly
- Multiple Factors. To design a protocol to send and receive messages. For example after deciding what model/message is the given information of the question/factor and what model/message is the answer, one can sample from question model/message and answer model/message separately and optimise the answer model/message.
- Speed up. This model is slow as it is. One possible speed up is to move the factors to higher layers of the network. (e.g. deciding fakeness from higher level features of generator as a subconscious shortcut[not a refined idea])
Multi Factor (Not refined)
It is not trivial how to reach multi factor belief propagation from the single factor algorithm, and needs more work. Here are a few suggestions or possible application.
- Conditioning as factors. By conditioning the GAN on different conditions as factors and asking to find a state satisfying all condition. This can be more elaborately made to be similar to existing message passing algorithms. The state can also be distributed between factors via associative memory.
- Sequential factors. Similar to FastSLAM, factors can be current Missing inputs, states in associative memory can act as map(distributed based on similarity in situation).
- Discriminator decides the factors. To solve a problem it is possible that discriminator puts attention on weaknesses of generator. The problem is solved when discriminator cannot find any unsatisfied condition(or factor). Or this can be done via active learning principles.
Other scenarios like manually factorised problems are also possible, for example while performing sum of two 10 digit numbers factors are visually local. Sum of local discriminator can be used as the final discriminator. Though all of these are imperfect and need more work.
Put your annotated bibliography here. Add links where appropriate.
Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.