Course talk:CPSC532:StaRAI2020:R-GCN Attribute Aggregation

[View source↑]
[History↑]

Feedback

say what R-GCN means the first time it is used (in abstract)

I am a bit suspicious of the results. I presume that accuracy means average |predicted-actual|. This has an optimum at the median of the values So I am very suspicious of the prediction that sats Dirichlet MLP is optimal. If we are using a measure that is optimized at the probability, such as log-likelihood (for Boolean values) or sum-of-squares, then I would expect the Dirichlet MLP to work best. In any case, the 0.99 and 1.0 seem to giid considering the prediction of the others.

DavidPoole (talk)‎

Collapse

I'll add that to the abstract. I calculated accuracy the same across all experiments. Its correct_predictions/total_predictions. Where a prediction is correct if the argmax of the mlp/rgcn output is equal to the label.

ObadaAlhumsi (talk)‎

Collapse

Heres a link to the implementation: https://colab.research.google.com/drive/1eCbpgZTUvy5a6vzycbeSMllMoUGeI9ld?usp=sharing

ObadaAlhumsi (talk)‎

Collapse

This needs to be explained clearly. So the output of the model is the argmax (mode?) of the GNN. This is then evaluated using accuracy. Please justify why such a measure is appropriate. (It always seems strange to me that we evaluate on a different metric than we learn. (But argmax isn't differentiable.)) It it easy to test on a measure that rewards accurate probabilistic predictions?

I think the accuracy of 1 -- perfect prediction on the test cases -- is a bit suspicious. It needs to be explained.

DavidPoole (talk)‎

Collapse

The output of the model is the normalized logits from the softmax operation. I used the cross-entropy loss to optimize the MLP parameters. For evaluation, in two class classification, a threshold is commonly used to assess the classification accuracy in terms of correct_prediction/total_predictions, where a prediction is correct if the output from the sigmoid is above a specific threshold(say 0.7). In my case (multi-class classification), I used the argmax from the softmax operation to assess which class to pick. I do agree that this is a bit strange, I looked through the R-GCN paper for what kind of accuracy they used, but they didn't specify it.

I think that using classification accuracy can be misleading as if 90% of the dataset is of class A, then it's easy for the model to produce 90% accuracy. I think also showing the cross-entropy-loss from each experiment might clarify that issue and give a measure that accounts for accurate probabilistic predictions.

ObadaAlhumsi (talk)‎

Collapse

Another thing that can be important to note is that the Dirichlet distributions are parameterized with the vector alpha prior to training. Thus the MLP is practically learning how to interpret the probability distributions (thus why I sample multiple times from each distribution). I think the fact that the Dirichlet distribution is parameterized prior to training is why the classification accuracy is so high.

ObadaAlhumsi (talk)‎

Collapse

I don't know what "parameterized prior to training" means. In any case, please include a brief description of the results in the page (rather than replying here). My goal is to get a better page rather than trying to have a long discussion.

Is it possible to get the accuracy of picking the mode? (Or whatever is the optimal prediction for your measure). The reason that I thought that might be low is that the other results are not that high. And 100% is a lot better than 80%!

DavidPoole (talk)‎

Feedback

Collapse

Great article! I really liked how detailed you were about everything. As I asked in the presentation, do you happen to know the variance of the nodes' degrees? That could possibly help make sense of what's happening in the results.

Side-note: I'm slightly amused at the fact that I could possibly be User1.

LuccaSiaudzionis (talk)‎

Collapse

Thanks Lucca! I'll be adding a few lines in the results for the nodes degrees variance soon!

ObadaAlhumsi (talk)‎

Feedback

Collapse

Hi Obada, Great article. I just wanted to understand what did you mean by "Using multiple samples of probabilities 1,2, and 3, we can form a node feature for the source node."?

MAULIKMAHESHBHAIPARMAR (talk)‎

Collapse

Thanks Maulik! The input into the MLP are the neighbouring node features(similar to RGCN). The node feature in this case is not an embedding but multiple samples from the dirichlet distribution.

For instance, sampling from the brazil dirichlet distribution once will yield (P(class=0), p(class=1), ..., p(class=k). We can sample multiple times and concatenate these probabilities to make a "node feature" that will be used in the MLP

ObadaAlhumsi (talk)‎

Thread title	Replies	Last modified
Feedback	6	18:06, 27 April 2021
Feedback	1	06:25, 26 April 2021
Feedback	1	06:21, 26 April 2021

Contents

Feedback

Feedback

Feedback