Feedback
say what R-GCN means the first time it is used (in abstract)
I am a bit suspicious of the results. I presume that accuracy means average |predicted-actual|. This has an optimum at the median of the values So I am very suspicious of the prediction that sats Dirichlet MLP is optimal. If we are using a measure that is optimized at the probability, such as log-likelihood (for Boolean values) or sum-of-squares, then I would expect the Dirichlet MLP to work best. In any case, the 0.99 and 1.0 seem to giid considering the prediction of the others.
I'll add that to the abstract. I calculated accuracy the same across all experiments. Its correct_predictions/total_predictions. Where a prediction is correct if the argmax of the mlp/rgcn output is equal to the label.
Heres a link to the implementation: https://colab.research.google.com/drive/1eCbpgZTUvy5a6vzycbeSMllMoUGeI9ld?usp=sharing
This needs to be explained clearly. So the output of the model is the argmax (mode?) of the GNN. This is then evaluated using accuracy. Please justify why such a measure is appropriate. (It always seems strange to me that we evaluate on a different metric than we learn. (But argmax isn't differentiable.)) It it easy to test on a measure that rewards accurate probabilistic predictions?
I think the accuracy of 1 -- perfect prediction on the test cases -- is a bit suspicious. It needs to be explained.
The output of the model is the normalized logits from the softmax operation. I used the cross-entropy loss to optimize the MLP parameters. For evaluation, in two class classification, a threshold is commonly used to assess the classification accuracy in terms of correct_prediction/total_predictions, where a prediction is correct if the output from the sigmoid is above a specific threshold(say 0.7). In my case (multi-class classification), I used the argmax from the softmax operation to assess which class to pick. I do agree that this is a bit strange, I looked through the R-GCN paper for what kind of accuracy they used, but they didn't specify it.
I think that using classification accuracy can be misleading as if 90% of the dataset is of class A, then it's easy for the model to produce 90% accuracy. I think also showing the cross-entropy-loss from each experiment might clarify that issue and give a measure that accounts for accurate probabilistic predictions.
Another thing that can be important to note is that the Dirichlet distributions are parameterized with the vector alpha prior to training. Thus the MLP is practically learning how to interpret the probability distributions (thus why I sample multiple times from each distribution). I think the fact that the Dirichlet distribution is parameterized prior to training is why the classification accuracy is so high.
I don't know what "parameterized prior to training" means. In any case, please include a brief description of the results in the page (rather than replying here). My goal is to get a better page rather than trying to have a long discussion.
Is it possible to get the accuracy of picking the mode? (Or whatever is the optimal prediction for your measure). The reason that I thought that might be low is that the other results are not that high. And 100% is a lot better than 80%!