Feedback
Another thing that can be important to note is that the Dirichlet distributions are parameterized with the vector alpha prior to training. Thus the MLP is practically learning how to interpret the probability distributions (thus why I sample multiple times from each distribution). I think the fact that the Dirichlet distribution is parameterized prior to training is why the classification accuracy is so high.
I don't know what "parameterized prior to training" means. In any case, please include a brief description of the results in the page (rather than replying here). My goal is to get a better page rather than trying to have a long discussion.
Is it possible to get the accuracy of picking the mode? (Or whatever is the optimal prediction for your measure). The reason that I thought that might be low is that the other results are not that high. And 100% is a lot better than 80%!