Feedback
I'll add that to the abstract. I calculated accuracy the same across all experiments. Its correct_predictions/total_predictions. Where a prediction is correct if the argmax of the mlp/rgcn output is equal to the label.
Heres a link to the implementation: https://colab.research.google.com/drive/1eCbpgZTUvy5a6vzycbeSMllMoUGeI9ld?usp=sharing
This needs to be explained clearly. So the output of the model is the argmax (mode?) of the GNN. This is then evaluated using accuracy. Please justify why such a measure is appropriate. (It always seems strange to me that we evaluate on a different metric than we learn. (But argmax isn't differentiable.)) It it easy to test on a measure that rewards accurate probabilistic predictions?
I think the accuracy of 1 -- perfect prediction on the test cases -- is a bit suspicious. It needs to be explained.
The output of the model is the normalized logits from the softmax operation. I used the cross-entropy loss to optimize the MLP parameters. For evaluation, in two class classification, a threshold is commonly used to assess the classification accuracy in terms of correct_prediction/total_predictions, where a prediction is correct if the output from the sigmoid is above a specific threshold(say 0.7). In my case (multi-class classification), I used the argmax from the softmax operation to assess which class to pick. I do agree that this is a bit strange, I looked through the R-GCN paper for what kind of accuracy they used, but they didn't specify it.
I think that using classification accuracy can be misleading as if 90% of the dataset is of class A, then it's easy for the model to produce 90% accuracy. I think also showing the cross-entropy-loss from each experiment might clarify that issue and give a measure that accounts for accurate probabilistic predictions.
Another thing that can be important to note is that the Dirichlet distributions are parameterized with the vector alpha prior to training. Thus the MLP is practically learning how to interpret the probability distributions (thus why I sample multiple times from each distribution). I think the fact that the Dirichlet distribution is parameterized prior to training is why the classification accuracy is so high.
I don't know what "parameterized prior to training" means. In any case, please include a brief description of the results in the page (rather than replying here). My goal is to get a better page rather than trying to have a long discussion.
Is it possible to get the accuracy of picking the mode? (Or whatever is the optimal prediction for your measure). The reason that I thought that might be low is that the other results are not that high. And 100% is a lot better than 80%!